Skip to:
Content

BuddyPress.org

Opened 6 years ago

Closed 5 years ago

#7818 closed defect (bug) (fixed)

Privacy: Data export for Activity

Reported by: boonebgorges's profile boonebgorges Owned by:
Milestone: 4.0 Priority: normal
Severity: normal Version:
Component: Activity Keywords: has-patch 2nd-opinion
Cc:

Description

Parent ticket: #7698

We should include Activity data in WP user data exports.

Attachments (4)

7818.diff (5.7 KB) - added by boonebgorges 6 years ago.
Screenshot_2018-05-11_16-04-54.png (61.7 KB) - added by boonebgorges 6 years ago.
7818.2.diff (5.4 KB) - added by boonebgorges 6 years ago.
Screenshot_2018-05-11_17-54-38.png (67.5 KB) - added by boonebgorges 6 years ago.

Download all attachments as: .zip

Change History (14)

@boonebgorges
6 years ago

#1 @boonebgorges
6 years ago

  • Keywords has-patch 2nd-opinion added

7818.diff is a first attempt at a patch. Screenshot_2018-05-11_16-04-54.png is a snippet from an export. Questions/comments:

  1. I've opted to include only activity_comment and activity_update. My thinking is that other activity types are not created directly, but as a result of some other action: creating a friendship, joining a group, updating one's profile, etc. So it feels more natural to count that data among those other export groups rather than here. However, I'm basing this on my desire to provide an export that is legitimately useful to a user. If the main goal is to be completist for the purpose of legal compliance, then perhaps we should default to providing everything. In that case, the internal logic could be a bit different: instead of hardcoding 'Activity Type' strings, for example, we could fetch that info out of registered types.
  1. Similarly, I've chosen only to give human-readable information about activity items. This means: content, date, relationships to groups and other activity items. It doesn't include things like item_id or mptt_left, except where those things are relevant for something visible, like a Group connection. Again, I find this more useful than a raw data dump, but I would like to hear other thoughts about this.

#2 @boonebgorges
6 years ago

The patch (and probably others) should be updated to process only a finite number of entries in a single run. See https://core.trac.wordpress.org/attachment/ticket/43602/ERASURE.md?marks=73,106#L72

Also, after talking with @johnjamesjacoby in Slack, let's go ahead and do all activity types, not just activity_comment.

@boonebgorges
6 years ago

#3 @boonebgorges
6 years ago

7818.2.diff makes the following changes:

  • All activity types are included.
  • Reformatted the outputted data. A fully-formatted 'Activity Description' (the "action") contains links to all related data, so I think it does most of the work of component, type, item_id, activity_id. The only other things we need are date, URL, and content (when content exists).
  • Added the logic to allow WP to batch-process.
  • Ensure that show_hidden=true

What do others think? Let's try to get this exporter looking basically how we want it to look, and then we can take the boilerplate (like the batch-process stuff) and use it to build the others.

#4 @DJPaul
6 years ago

At another quick glance, this patch also looks fine.

A suggestion I have is to not assume that all activity descriptions will contain that pattern of link data, which is what (most?) of BuddyPress core does. If memory serves, an activity has a primary, secondary ID properties, and a type. I think we could include that data, so if people want to build a map of their data from one BP data export to another, they still have the relationship ideas.

(In other words, making it possible for another tool to build a relationship map from a whole collection of export files for a user, would be pretty neat. Even if that's not the primary purpose.)

#5 @DJPaul
6 years ago

  • Milestone changed from 3.1 to 4.0

Milestone renamed

#6 @boonebgorges
6 years ago

Thanks, @DJPaul. You're correct that activity descriptions don't necessary contain the pattern of link data. But the item_id, secondary_item_id, type, and component properties (especially the first two) are fairly uninformative for end-users. To make them useful, we'd need a large translation matrix: item_id means X for type Y, etc. This is a lot of work just to reproduce what the description already does for all core activity types.

Perhaps a middle ground is to pass the value through a filter, so that third-party tools can be responsible for formatting activity items of their own type.

We should work toward a future where it's possible to get an export of data and do something programmatic with it, like importing it to another BP installation. It will be hard: we'll need things like UUIDs for each BP object. However, this is a separate project from the kind of export discussed here, whose goal is to show the user "what the site knows about me", in compliance with GDPR and other user-facing regulations. So I think we should set aside the more interesting thing for the time being.

#7 @boonebgorges
6 years ago

In 12112:

Privacy: Data exporter for Activity.

See #7818.

#8 @boonebgorges
6 years ago

In 12113:

Privacy: Add missing textdomain.

Introduced in [12112].

See #7818.

#9 @DJPaul
6 years ago

That makes sense, thanks

#10 @boonebgorges
5 years ago

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.