Skip to:
Content

Opened 2 years ago

Closed 5 weeks ago

Last modified 5 weeks ago

#3856 closed enhancement (fixed)

Saving activity update action as an array in db

Reported by: modemlooper Owned by: boonebgorges
Milestone: 2.0 Priority: normal
Severity: normal Version:
Component: Activity Keywords: has-patch 2nd-opinion
Cc: mercijavier@…, sascha@…

Description

So I'm back to working on json API. The action is saved as one chunk.

I'm suggesting the action be broken up and saved as an array.

Example:

user => user_id
user_link => http://blah.com/user
group_name => Group Name
group_url => http://blah.com/group
meta => posted an update in the group

Not sure how this would affect performance. It would give better options for output during application development instead of being stuck with set HTML.

Attachments (5)

3856.01.patch (10.4 KB) - added by boonebgorges 7 weeks ago.
3856.02.diff (10.4 KB) - added by imath 7 weeks ago.
3856.03.patch (56.9 KB) - added by boonebgorges 6 weeks ago.
3856.03--no-prefix.patch (56.8 KB) - added by imath 6 weeks ago.
3856.04.patch (45.7 KB) - added by r-a-y 6 weeks ago.

Download all attachments as: .zip

Change History (36)

comment:1 boonebgorges2 years ago

  • Keywords 1.7-early added
  • Milestone changed from Awaiting Review to Future Release

I'm not sure that saving it in an array is the best, but I think we should definitely consider not saving it (only) as a hardcoded string. It causes huge problems, among other things, for people who want to build multilingual BP installations.

The root reason for doing it this way is performance. To rebuild that string at runtime, we need the name and URL of the object(s), such as the group, forum, etc. Pulling them up when building a page of 20 activity items would add a ton of overhead if done wrong.

So, we might think of saving it in chunks (what modemlooper is suggesting, essentially). Alternatively, we could mod the activity loop a bit so that we do a second query for all of the associated item metadata in one fell swoop, and then use that to construct on the fly. This latter option would add a single query, and would add a bit of computational overhead, but it might be the most flexible. We would make ample use of the WP cache, so that persistent object caching would all but eliminate all overhead.

Let's talk about it early in the 1.7 cycle. This is something that's bothered me for a while, and I think it'll be fairly easy to fix.

comment:2 modemlooper2 years ago

Cool, need a way to restructure output for custom themes/applications.

comment:3 boonebgorges21 months ago

  • Keywords 1.7-early removed
  • Milestone changed from Future Release to 1.7

Let's discuss this in the upcoming week or two as we settle on a 1.7 scope.

comment:4 DJPaul18 months ago

Bump to 1.8?

Last edited 18 months ago by DJPaul (previous) (diff)

comment:5 johnjamesjacoby16 months ago

  • Milestone changed from 1.7 to 1.8

Bumping to 1.8. Let's see some patches.

comment:6 boonebgorges11 months ago

No patches, which means too late for 1.8. But remains something valuable that we can do without breaking backward compatibility, probably by adding another column to the activity table, so we should do it for 1.9.

comment:7 boonebgorges11 months ago

  • Milestone changed from 1.8 to 1.9

comment:8 r-a-y6 months ago

  • Milestone changed from 1.9 to 2.0

comment:9 terraling6 months ago

I'm currently building a multi-lingual site and am discovering the problems which arise from constructing the activity string at the moment an activity is recorded rather than when it is retrieved for display.

So if John is using the site in English and Juan is using it in Spanish, when John looks at the activity stream he sees:

John posted an update 3 hours ago
Juan ha publicado una actualización 4 hours ago
John changed their profile picture 5 hours ago

i.e. Juan "posted an update" is in Spanish because he was using the site in Spanish at the time, rather than the English that John is currently viewing the site in.

My hack requires removing the translations for "posted an update" etc. in the .mo files so that activity is always recorded in the database in English, and then using the bp_get_activity_action_pre_meta filter to str_replace "posted an update" etc. with the translation as required.

In terms of how you might make minimal changes to how this currently works, unless I'm mistaken there is one-to-one relationship between the activity "type" and the text appended in the activity "action".

E.g.
new_member -> ' became a registered member'
activity_update -> ' posted an update'
new_avatar -> ' changed their profile picture'
etc.

So for performance reasons you could continue to build the action at the moment the activity is recorded, but leave off the text, and append that text (wrapped in a get_text call) based upon the activity type when the activity is retrieved for display.

That would certainly make things considerably easier in terms of multilingual handling, not sure what other use cases apply.

comment:10 boonebgorges6 months ago

terraling - Thanks very much for your feedback. Your use case (multilingual sites) is exactly the motivation for this change.

The hack that you've put together - filtering bp_get_activity_action_pre_meta - is exactly what I would do in your position.

A more general solution will have to be more general. In your bp_get_activity_action_pre_meta callback, you have the luxury of knowing which languages you need to translate into. So you can assemble a finite list of translations, and do the translations on the fly. BP itself doesn't have this luxury - you could be running in any one of dozens of languages. So what we'll have to do is to extend our activity API with dynamic callback functions. Something like:

bp_activity_add( array(
    // ...
    'action_callback' => 'bp_friends_new_friends_activity_action_callback',
    'action_callback_params' => array( $initiator_user_id, $friend_user_id, ),
    'type' => 'friendship_accepted',
    // ...
) );
function bp_friends_new_friends_activity_action_callback( $initiator_user_id, $friend_user_id ) {
    $initiator_link = bp_core_get_userlink( $initiator_user_id );
    $friend_link = bp_core_get_userlink( $friend_user_id );
    return sprintf( __( '%1$s and %2$s are now friends', 'buddypress' ), $initiator_link, $friend_link );
}

Then, when fetching the action in the template loop, we'll get the string from the callback function. This'll allow translation into arbitrary languages on the fly. (It's not too different in concept from what you and modemlooper have suggested with storing the gettext chunks in the DB. However, the callback technique accounts for the fact that different chunks will have different numbers of arguments to swap out, and the fact that translations may change over time.)

This is off the top of my head, so the proposal needs lots of fleshing out. (There are issues with backward compatibility - dealing with plugins that don't provide the necessary callbacks - as well as the question of how much stuff should be stored in the DB and how much should be pulled up dynamically, and whether the dynamic fetching should be done inline or at the beginning of the activity loop. Etc etc etc.) But I would like to make it a priority for 2.0.

comment:12 johnjamesjacoby4 months ago

Going the callback route is absolutely the way to go. We can use the Notifications component as a guide, take what we like, nix what we don't, and improve both components with similar approaches.

This has been my long list of like-to-haves since 1.1.

comment:13 mercime3 months ago

  • Cc mercijavier@… added

comment:14 follow-up: boonebgorges7 weeks ago

  • Keywords has-patch 2nd-opinion added

OK, I have a proof-of-concept for this. See 3856.01.patch. Basically:

  • When registering actions with bp_activity_set_action(), include a 'format_callback' param
  • When saving activity items, stash useful concatenated info into a piece of activitymeta called 'action_data'. Stuff like: userlink, grouplink, etc
  • When querying activity items, populate the 'action' property with the function bp_activity_generate_action_string(). That function looks up the format_callback registered for that activity type, and then passes the activity object along to the callback
  • I've implemented two examples of formatting callbacks: bp_activity_format_activity_action_activity_update() and bp_groups_format_activity_action_joined_group()

The technique seems to be working well. No real overhead is added, since activitymeta is already pre-fetched within activity loops.

The 'action_data' technique is my way of caching expensive data. (Remember: caching expensive data was the reason for storing an action string in the first place.) I've based in it part on the way that WP uses '_wp_attachment_metadata' postmeta to store stuff. However, the more time I spend thinking about the callback technique, the more I begin to envision ways that we could avoid most of this action_data caching. For example, it wouldn't add much overhead to fetch display names and user links for all users at the beginning of an activity loop, in the same way that we pre-fetch a bunch of other data. (This becomes even less of a worry now that we've improved our object caching so much.) However, some data - especially the names and URLs of posts from other blogs - will always have to be stored somehow, as it's probably never going to be feasible to grab it on the fly.

It'd be great to get initial feedback on the technique I've sketched here. Thanks.

boonebgorges7 weeks ago

comment:15 modemlooper7 weeks ago

awesome, gonna test tomorrow

comment:16 in reply to: ↑ 14 imath7 weeks ago

Replying to boonebgorges:

It'd be great to get initial feedback on the technique I've sketched here. Thanks.

Hi boonebgorges. First: thanks a lot for the great work :)

I've tested the patch, and the first thing i wanted to check was to join a group and immediately deactivate the group component to see what was happening, because i've seen that the format callback are in the group component.
And this is what i got when i refreshed the activity stream:
https://farm4.staticflickr.com/3703/12857838984_c3ccb25e3c_z.jpg

Something happened 3 minutes ago :)

So i thought about it, at first i've tried a patch to remove the activity that was concerning inactive components. Although i really think, that this is something we should do like the check we're doing in the notifications component thanks to the function bp_notifications_get_registered_components(), i thought about plugins. If we were to do that, then a lot of plugins wouldn't have their activity displayed although they're activated.
So i've tried a second patch using buddypress()->deactivated_components and "array_intersecting" it with a hardcoded array of components that are generating activities. It was working, but i realized i wasn't in the scope of this ticket. I've checked the way activity was displayed in 1.9.x in case a component was inactive, and they are actually displayed :)

So i thought well, having "something happened 3 minutes ago" will avoid having for instance a link to a group that would lead to a 404 :)

So i've tested to create some callbacks from a plugin and it's working really great. Then something seemed weird to me. But i may be wrong about my interpretation of the code. So sorry if i am!!

1) I thought it wasn't necessary to create an activity_meta in case the action_data was an empty array. So i've added a check.

2) then i wondered why in the new bp_activity_add() we have :

$activity->action = isset( $action ) ? $action : bp_activity_generate_action_string( $activity );

Because $action is always set, if not passed in arguments to bp_activity_add() then it defaults to '' so the bp_activity_generate_action_string( $activity ) part is never played and as a result activity_update and joined_group has a blank action in activity table.

3) So i've imagined (may be i was wrong) that you wanted to keep on populating the action field in the activity table by generating the action thanks to the callback function. So i've replaced by a ! empty() check and just above set the var $activity->action_data to $action_data and it created the action in the activity table field.

Just in case i'm not wrong, i'm attaching 3856.02.diff

Finally, i think :
When this very interesting improvement will be implemented, it will surely require a message to plugin developer to enjoy it. So plugin developer will begin to use bp_activity_set_action() to register their activities, may be we can also improve the templates (activity/index, members/single/activity & groups/single/activity) to make this use avoids them to hook the "filter" actions. And why not carry on by only displaying activities for the active components.

Last edited 7 weeks ago by imath (previous) (diff)

imath7 weeks ago

comment:17 follow-up: boonebgorges7 weeks ago

Something happened 3 minutes ago :)

Ah yes. So, what I meant to do (but I see now that I didn't fully implement) was this: When an activity item is created, store the generated action test as a string in the 'action' field. (Like we currently do.) On output, try to generate the action string dynamically. But if this fails, use the literal action string as a fallback. I forgot to do this last part :) I think that's better than trying to remove stuff from the activity stream, at least for now. (It may or may not be a good idea not to show activity items from components that have been deactivated, but I don't think we should make the decision based on technical limitations of this patch.)

I thought it wasn't necessary to create an activity_meta in case the action_data was an empty array. So i've added a check

Excellent. I meant to do this myself.

Because $action is always set, if not passed in arguments to bp_activity_add() then it defaults to so the bp_activity_generate_action_string( $activity ) part is never played and as a result activity_update and joined_group has a blank action in activity table.

For backward compatibility. On the day that 2.0 launches, it's likely that most plugins (like, say, BuddyPress Docs) will not have updated to the new callback method. We still want to support recording a literal 'action' string in the database. The logic in the patch might be incorrect (I need to write some unit tests ;) ) but we do need to do something like this. I think your empty() modification might fix it.

So plugin developer will begin to use bp_activity_set_action() to register their activities, may be we can also improve the templates (activity/index, members/single/activity & groups/single/activity) to make this use avoids them to hook the "filter" actions.

I'm not sure I understand this. Can you give an example?

And why not carry on by only displaying activities for the active components.

This is an interesting idea, but as I suggest above, I think it's a separate issue and we should discuss in a different ticket.

Thanks very much for the feedback! I think this will make a huge difference for international and multi-lingual communities.

comment:18 in reply to: ↑ 17 imath7 weeks ago

Replying to boonebgorges:

I'm not sure I understand this. Can you give an example?

After a night of sleep, it may not be a so good idea for theme backcompat :) My idea was to hook (or create a new template tag) 'bp_activity_filter_options', 'bp_member_activity_filter_options', 'bp_group_activity_filter_options' to populate the activity filters thanks to bp_activity_get_types().

And why not carry on by only displaying activities for the active components.

This is an interesting idea, but as I suggest above, I think it's a separate issue and we should discuss in a different ticket.

Yes these two last thoughts are out of ticket scope, i'll certainly add one for at least the second one

Thanks very much for the feedback! I think this will make a huge difference for international and multi-lingual communities.

You're very welcome, and i agree at 100% :)

comment:19 landwire7 weeks ago

  • Cc sascha@… added

boonebgorges6 weeks ago

comment:20 follow-up: boonebgorges6 weeks ago

imath - Thanks for the feedback. Your comments got me thinking.

My original idea was to store 'action_data' for each activity item, which would then be used at runtime to generate the action strings. This is very inelegant, however. The structure of the action_data is always going to be different. It's often going to mean storing the same content in the activitymeta table over and over and over again (like 'http://example.com/members/boone'). And it reproduces the problem that our current technique has with stale content: when someone changes (eg) their username, the activity action doesn't change.

So, I decided that this was the right time to go for a broader fix. See 3856.03.patch.

Here's my basic strategy: Not only are strings concatenated at runtime, but the data for the strings is also fetched at runtime. The original argument against this was performance. But, using some of the caching improvements that have gone into trunk during this cycle, I've been able to mitigate these performance issues almost completely.

After activity items are pulled up in BP_Activity_Activity::get(), the results are run through a filter 'bp_activity_prefetch_object_data'. Components then use this hook to loop the activity items to find those that "belong" to the component, to identify the associated object, and then to prime the cache for that set of objects. So, for example, the groups component looks over the activity stream, and for each item with 'component=groups', it keeps track of the 'item_id' (which is the group id). It then does a single query to populate the cache with data related to these groups. Then, the _format_activity_action_ callback functions (like bp_groups_format_activity_action_created_group()) use the regular functions like bp_get_group_permalink() to create the necessary action string, with no performance implications.

The one place where this kind of pre-fetching is not feasible is with the blogs component when running Multisite - there's no way to optimize a query that has to switch_to_blog() a dozen times. So, in the case of new_blog, new_blog_post, and new_blog_comment, I've gone with a variation on my original 'action_data' plan. For blog_url and blog_name, which will obviously remain the same between activity items related to the same blog, I'm using data from bp_blogs_get_blogmeta(). For post_url and post_name, I'm creating custom *activitymeta* items. (You'll notice that there are a couple of ancillary mods to the blogs component to make this all possible.) It's not ideal, but it's not any worse than what we currently do, and it performs snappily.

The other complication is maintaining backward compatibility for the old 'activity_action' filters. Where possible, I've kept them intact. In some cases, the filter is expecting an argument that I don't have easy access to - like the $post object after a new blog post. To keep overhead to a minimum in these cases, I check has_filter() before fetching the necessary data. My guess is that very, very few people are using these filters, but this'll keep it working for those who are, and doesn't cause any harm (aside from a few extra lines of code) to those who aren't.

I've written a whole stack of unit tests, as you can see, but this could use some more eyeballs. In particular:

  • Devs: Please let me know what you think of the techniques I've described above.
  • Everyone: Please test with all sorts of different activity items, plugins, etc. It should degrade nicely in cases where components are deactivated, where plugins aren't using the new callbacks, etc, but this could all use some checking.

comment:21 in reply to: ↑ 20 imath6 weeks ago

Replying to boonebgorges:

imath - Thanks for the feedback. Your comments got me thinking.

yw :)

  • Devs: Please let me know what you think of the techniques I've described above.

I need to test it a bit more, but my first impression is first: Great work
& second: i find it "weird" in the activity meta table in case of comments on the same blog post to have post_title and post_url repeated at each comment as data is the same. I understand why it's needed. Just thought watching the table, it was weird :) Too bad there's not something like a parent field to only put this piece of info once in the activity meta for the post.
I'm adding a --no-prefix (3856.03--no-prefix.patch) version of the patch to ease testing ;)

imath6 weeks ago

comment:22 boonebgorges6 weeks ago

Great work

I'm adding a --no-prefix (3856.03--no-prefix.patch) version of the patch to ease testing ;)

Thanks, and thanks :)

i find it "weird" in the activity meta table in case of comments on the same blog post to have post_title and post_url repeated at each comment as data is the same

Yeah. The other technique I toyed with was storing it in blogmeta, to reduce duplication. There are a couple problems there though. First, we'd have to do something kinda funky to key all of this stuff. So, something like bp_blogs_update_blogmeta( $activity->item_id, 'post_url_' . $activity->secondary_item_id, $post_url ). Not a huge deal, but a little weird. Second, this technique wouldn't work for comments, because activity items don't tell us the post_id - item_id=blog_id and secondary_item_id=comment_id. To get the post_id, we have to pull up the comment object, which defeats the purpose. That said, we could do something like this in the case of comments: bp_blogs_update_blogmeta( $activity->item_id, 'post_id', $post_id ), and then when building the string, we can first pull this blogmeta to get the post ID, then lookup the post_url separately using 'post_url_' . $post_id.

In short, we save a little bit of room in the database, and reduce duplication by a little bit, but we introduce a bit of funny machinery. I'm not sure if the tradeoff is worth it. What do you think?

comment:23 imath6 weeks ago

Replying to boonebgorges:

I'm not sure if the tradeoff is worth it. What do you think?

Yes, i agree, it doesn't worth it. Things would have been different, if the activity table could have directly given us the activity id of the corresponding post (like a parent_id field). then the post_url and post_name would have been fetched from the activity_meta of this "parent" activity id. As we don't have this field and i'm not sure it's worth altering the table for only one case (blog comments), the way to deal with blog comments in your patch is better.

comment:24 r-a-y6 weeks ago

I've looked at the patches and they look great!

I have one thing to add.

When an activity loop is made, the userdata (user_login, user_nicename, user_fullname) is already populated, we can use these items directly instead of calling bp_core_get_userlink(), which can lead to additional queries.

Therefore, in 04.patch, I've added a new function called bp_activity_get_userlink() and replaced calls to bp_core_get_userlink() for all activity action callbacks.

Let me know what you think.

Last edited 6 weeks ago by r-a-y (previous) (diff)

r-a-y6 weeks ago

comment:25 boonebgorges6 weeks ago

Thanks for looking, r-a-y!

I'll do some more testing with your patch, but I'm 99% sure that your change is unnecessary. Since r8024, BP_User_Query caches 'bp_core_userdata_' data, which includes user_nicename, etc. And BP_Activity_Activity::get_activity_data(), which runs just after each activity query, calls BP_User_Query for all activity users in the loop. That means that, by the time you get to the activity action callback functions, all the necessary data is already in the cache. Does that seem right?

comment:26 ircbot6 weeks ago

This ticket was mentioned in IRC in #buddypress-dev by boonebgorges. View the logs.

comment:27 r-a-y6 weeks ago

Since r8024, BP_User_Query caches 'bp_core_userdata_' data, which includes user_nicename, etc.

Cool, I must have missed that changeset! That sounds about right. Will do some additional testing to verify.

Update: Boone is right! Disregard 04.patch.

Last edited 6 weeks ago by r-a-y (previous) (diff)

comment:28 boonebgorges6 weeks ago

Thanks, r-a-y!

Note: see #4429, which has been held up because of too-many-queries-in-the-loop considerations that will be solved once this patch is committed.

comment:29 ircbot5 weeks ago

This ticket was mentioned in IRC in #buddypress-dev by boonebgorges. View the logs.

comment:30 boonebgorges5 weeks ago

  • Owner set to boonebgorges
  • Resolution set to fixed
  • Status changed from new to closed

In 8125:

Generate activity actions dynamically, instead of using static values stored in the database

Activity actions - like 'Joe became a registered member' - have always been
stored as static strings in the activity table. This has been a perennial thorn
in the side of BP site administrators. For one thing, static strings cannot be
translated for multilingual audiences (without unreasonable workarounds).
For another, information in these static strings becomes out of date as users
change their display names, groups change their names, and so on.

This changeset makes it possible for activity types to be registered with a
new $format_callback function, passed to bp_activity_set_action(), which BP
will use to generate the action dynamically when building the activity loop.
This makes the activity stream fully localizable, continually up-to-date, and
generally the bomb dot com.

Existing plugins that do not register format_callback functions will continue
to have their actions pulled directly from the database. Likewise, since a copy
of the static string is saved in the activity table, static strings can be
served if the given component is disabled at any point (and thus the callback
is no longer available).

The original argument for caching the action strings was that generating them
inline was too resource intensive; it often requires pulling up display names,
user permalinks, group links, blog post names, and so forth. To address this
problem, each component can now prefetch required data at the beginning of an
activity loop by hooking to bp_activity_prefetch_object_data, and loading any
relevant data into the cache. The Activity, Friends, Groups, and Blogs
component now do this natively. It's likely that this prefetching will have
other performance benefits, as plugin authors will now be able to access user
data etc inline without performance penalties.

The case of the Blogs component is a special one; it's not practical to
prefetch data from multiple blogs at the beginning of a loop, due to the
resources required by switch_to_blog(). For this reason, the format_callback
functions in the blog component cache some relevant data (like the post name)
in blogmeta, where it's readily accessible within the loop.

Fixes #3856

comment:31 boonebgorges5 weeks ago

In 8126:

Use WP Plugin API rather than call_user_func() to apply format_callback to activity actions

This allows plugin authors to modify or override default formatting callbacks
using the standard remove_filter() and add_filter() API functions.

Hat-tip to nacin for the idea

See #3856

Note: See TracTickets for help on using tickets.