Opened 6 years ago
Closed 5 years ago
#7818 closed defect (bug) (fixed)
Privacy: Data export for Activity
Reported by: | boonebgorges | Owned by: | |
---|---|---|---|
Milestone: | 4.0 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Activity | Keywords: | has-patch 2nd-opinion |
Cc: |
Description
Parent ticket: #7698
We should include Activity data in WP user data exports.
Attachments (4)
Change History (14)
#2
@
6 years ago
The patch (and probably others) should be updated to process only a finite number of entries in a single run. See https://core.trac.wordpress.org/attachment/ticket/43602/ERASURE.md?marks=73,106#L72
Also, after talking with @johnjamesjacoby in Slack, let's go ahead and do all activity types, not just activity_comment
.
#3
@
6 years ago
7818.2.diff makes the following changes:
- All activity types are included.
- Reformatted the outputted data. A fully-formatted 'Activity Description' (the "action") contains links to all related data, so I think it does most of the work of
component
,type
,item_id
,activity_id
. The only other things we need are date, URL, and content (when content exists). - Added the logic to allow WP to batch-process.
- Ensure that
show_hidden=true
What do others think? Let's try to get this exporter looking basically how we want it to look, and then we can take the boilerplate (like the batch-process stuff) and use it to build the others.
#4
@
6 years ago
At another quick glance, this patch also looks fine.
A suggestion I have is to not assume that all activity descriptions will contain that pattern of link data, which is what (most?) of BuddyPress core does. If memory serves, an activity has a primary, secondary ID properties, and a type. I think we could include that data, so if people want to build a map of their data from one BP data export to another, they still have the relationship ideas.
(In other words, making it possible for another tool to build a relationship map from a whole collection of export files for a user, would be pretty neat. Even if that's not the primary purpose.)
#6
@
6 years ago
Thanks, @DJPaul. You're correct that activity descriptions don't necessary contain the pattern of link data. But the item_id, secondary_item_id, type, and component properties (especially the first two) are fairly uninformative for end-users. To make them useful, we'd need a large translation matrix: item_id
means X for type
Y, etc. This is a lot of work just to reproduce what the description already does for all core activity types.
Perhaps a middle ground is to pass the value through a filter, so that third-party tools can be responsible for formatting activity items of their own type.
We should work toward a future where it's possible to get an export of data and do something programmatic with it, like importing it to another BP installation. It will be hard: we'll need things like UUIDs for each BP object. However, this is a separate project from the kind of export discussed here, whose goal is to show the user "what the site knows about me", in compliance with GDPR and other user-facing regulations. So I think we should set aside the more interesting thing for the time being.
7818.diff is a first attempt at a patch. Screenshot_2018-05-11_16-04-54.png is a snippet from an export. Questions/comments:
activity_comment
andactivity_update
. My thinking is that other activity types are not created directly, but as a result of some other action: creating a friendship, joining a group, updating one's profile, etc. So it feels more natural to count that data among those other export groups rather than here. However, I'm basing this on my desire to provide an export that is legitimately useful to a user. If the main goal is to be completist for the purpose of legal compliance, then perhaps we should default to providing everything. In that case, the internal logic could be a bit different: instead of hardcoding 'Activity Type' strings, for example, we could fetch that info out of registeredtypes
.item_id
ormptt_left
, except where those things are relevant for something visible, like a Group connection. Again, I find this more useful than a raw data dump, but I would like to hear other thoughts about this.