Skip to:
Content

Opened 2 years ago

Closed 16 months ago

Last modified 14 months ago

#5128 closed enhancement (fixed)

Move user and group last_activity out of wp_usermeta

Reported by: boonebgorges Owned by:
Milestone: 2.0 Priority: high
Severity: major Version:
Component: Component - Core Keywords: has-patch needs-testing
Cc: sam3dus@…

Description

See https://buddypress.trac.wordpress.org/ticket/4060#comment:3 and ensuing discussion for some background.

BP does lots of sorting and filtering by users' (and to some extent groups', though users are much more problematic and important) 'last_activity' meta value. But this scales very poorly, as the wp_usermeta table is very much not designed for this kind of thing. The changes introduced by BP_User_Query help by removing some of the worst-offending JOINs, but the fundamental problem remains: last_activity data should be stored somewhere where it can be properly indexed and typed, so that queries and sorts are reasonably fast.

My initial proposal is that we move last_activity to the bp_activity table, which is already structured properly. (There are two other possibilities that I might also explore: adding a column to the wp_users table and adding a separate last_activity table.)

Challenges:

  • backward compatibility. Should we sync back to last_activity just for this purpose?
  • If moving into bp_activity, we'll need that component, or at least a minimal subset of it, to be active at all times

I'll work up a proof-of-concept soon, and try to run some benchmarks to show that it's worthwhile. Thoughts and feedback welcome in the meantime.

Attachments (2)

5128.patch (19.4 KB) - added by boonebgorges 17 months ago.
5128.02.patch (23.1 KB) - added by boonebgorges 17 months ago.

Download all attachments as: .zip

Change History (33)

comment:1 @shanebp2 years ago

Great to see this issue is at the fore.
Moving to bp_activity table would be an improvement, but...

There are two other possibilities that I might also explore: adding a column to the wp_users table and adding a separate last_activity table.

Adding a column to wp_users seems like asking for trouble.

I'm glad you're willing to consider adding a separate last_activity table, given your aversion to new tables.
That approach would be very scalable.
And it wouldn't require part of the activity component to be active at all times.
And it could be back-pat by including a script that copies over any existing 'last_activity' meta values to the new table.

comment:2 @shanebp2 years ago

  • Cc sam3dus@… added

comment:3 @dimensionmedia2 years ago

I would second shanebp's concern about the wp_users table modification. I've always taken away that making those modifications a method of last resort. I like the idea of using the bp_activity table.

comment:4 @DJPaul2 years ago

Making a new table for one specific, relatively small, feature seems a bit shortsighted. If this option is taken, the table should be more generic to allow us to use it as general purpose, better-indexed meta table for other things in BuddyPress in the future (if there are any).

comment:5 @r-a-y21 months ago

  • Milestone changed from 1.9 to 2.0

@boonebgorges17 months ago

comment:7 follow-up: @boonebgorges17 months ago

  • Keywords has-patch needs-testing added

5128.patch is a first attempt at moving 'last_activity' out of usermeta. It's a complicated issue, so bear with me while I break down the strategy.

Performance

The change requires a number of changes that make direct comparisons difficult, but here's a summary of the relevant queries (on an installation with about 100,000 activity items and 25,000 users):

  1. Trunk
SELECT umeta_id FROM wp_usermeta WHERE meta_key = 'last_activity' AND user_id = 1 (0.2ms)

SELECT u.user_id as id FROM wp_usermeta u WHERE u.meta_key = 'last_activity' ORDER BY u.meta_value DESC LIMIT 0, 20 (57.3ms)

SELECT COUNT(u.user_id) FROM wp_usermeta u WHERE u.meta_key = 'last_activity' (28.3ms)
  1. With 5128.patch applied:
SELECT id, user_id, date_recorded FROM wp_bp_activity WHERE component = 'members' AND type = 'last_activity' AND user_id IN (1) LIMIT 1 (0.2ms)

SELECT u.user_id as id FROM wp_bp_activity u WHERE u.component = 'members' AND u.type = 'last_activity' ORDER BY u.date_recorded DESC LIMIT 0, 20 (0.2ms)

SELECT COUNT(u.user_id) FROM wp_bp_activity u WHERE u.component = 'members' AND u.type = 'last_activity' (30.4ms)

SELECT id, user_id, date_recorded FROM wp_bp_activity WHERE component = 'members' AND type = 'last_activity' AND user_id IN (1,26138,23108,18770,24279,24090,7478,5470,15216,2957,23229,3037,10118,17063,29675,28554,3776,10632,13248,16357) LIMIT 20 (0.3ms)

Basically, the main query went from about 57ms to .3ms. Roughly two orders of magnitude improvement. This will become more dramatic on larger sites (usermeta's performance will degrade in a greater-than-linear fashion), especially where usermeta is filled with lots of other junk from other plugins.

In short, the performance implications are very significant, and we should move forward with this fix.

Implementation Details

  1. I opted to use wp_bp_activity for storage. If I'd created my own table, it would've had the following columns: id, user_id, date_recorded, with all three columns indexed. The activity table already has these columns, and already has the indexes, so I thought it made sense to use it. Moreover, it might one day be useful to query for last_activity data *alongside other kinds of activity*; storing them in one table makes this possible. The performance implications of using wp_bp_activity, instead of a dedicated table, are very minimal.
  1. In order to use the activity table even when activity is disabled, I do the following:
    1. Register $bp->table_prefix . 'activity' as $bp->members->table_name_last_activity (so we're not dependent on $bp->activity)
    2. Install the activity tables even when the Activity component is disabled
    3. Provide my own custom database methods for querying the data (see BP_Core_User::get_last_activity() etc)

In other words, I'm using the activity *table*, but I'm not using any of the activity component.

  1. I've built a migration tool (see bp-core/bp-core-update.php). For the moment, the tool leaves the usermeta data in place (this makes comparative benchmarking easier).
  1. I've swapped out most of the references to the 'last_activity' usermeta throughout the codebase.

Further questions

There are a few issues I wanted to get feedback on before moving forward.

  1. I haven't altered any of the half-dozen or so methods in BP_Core_User that make direct queries to wp_usermeta for the 'last_activity' data. These methods are no longer used in BuddyPress. On the other hand, they may still be in use in plugins etc, and the proposed changes will probably break them.
  1. Related: I've left the old usermeta data in place. Existing plugins may be relying on it. It may be wise not only to keep the old usermeta data there, but to mirror the new 'last_activity' data in usermeta going forward. That way, plugins that are reading this data in usermeta will continue to be able to do so (though it would be unwise for plugins to *update* this data). I'm torn on this issue. I'm leaning towards mirroring, but throwing big deprecated notices for anyone who attempts to query that data from usermeta.
  1. I haven't touched group or blog 'last_activity'. If we're satisfied with my treatment, I'll port those over too.
  1. I haven't implemented any caching yet. I'll do so once the first pass has been committed to trunk.

Overall, I'm pretty satisfied with this patch - I think it's a huge improvement over what we've got, and I think it covers most backward compatibility/migration cases well - but would like feedback on the general strategy and the points above before committing. Thanks.

Last edited 17 months ago by boonebgorges (previous) (diff)

comment:8 in reply to: ↑ 7 @imath17 months ago

Hi Boone,

Just tested with activity component activated and not : works great !!

Replying to boonebgorges:

  1. ... On the other hand, they may still be in use in plugins etc, and the proposed changes will probably break them.

I'd say : if the usermeta 'last_activity' is not updated (mirror of point 2) anymore, then i think the different methods may need to be altered to use the new way.

  1. ...It may be wise not only to keep the old usermeta data there, but to mirror the new 'last_activity' data in usermeta going forward...

I think we first need a way to be sure to get the 'last activity' information for all members. Because afer applying the patch, the user having the usermeta but not the new activity type won't be listed in the members directory (last_active filter). They will become "never active" and only available in the Alphabetical filter of the members directory. It will need to wait they log in again.

  • I know it would be a little problem for #5367.
  • And after upgrading, the administrator might be surprised to see that all his active members became never active (and are only available in the members directory if the Alphabetical filter is on)

I'm afraid, it may be needed to batch create the new activity type from the usermeta (for instance by having a tool to "repare" the information).

If the information for all members is in the activity table, and plugins are using the functions : bp_get_user_last_activity() and bp_update_user_last_activity(), then it's not needed to mirror the information in usermeta. The problem will only happen if the plugin is using a custom query or (bp_)get_user_meta function.

comment:9 follow-up: @boonebgorges17 months ago

  • Keywords 2nd-opinion added

Thanks very much for having a look, imath.

Because afer applying the patch, the user having the usermeta but not the new activity type won't be listed in the members directory (last_active filter)

If this didn't work for you, it's a bug in the patch. I have a migration routine already in place. See the bp-core-update.php section of the patch: https://buddypress.trac.wordpress.org/attachment/ticket/5128/5128.patch. It should run on 'bp_admin_init' - maybe you never loaded the admin? Load the admin page (as you would if you were upgrading the plugin to 2.0) and see if it works for you.

The problem will only happen if the plugin is using a custom query or (bp_)get_user_meta function.

Correct. This is the case I'm worried about. I've just done a quick search through plugins.svn.wordpress.org, and found the following plugins are making direct calls to usermeta for 'last_activity':

  • bp-group-control
  • bp-default-data
  • welcome-pack
  • activate-users-in-buddypress
  • bp-groups-suggestions
  • hts-display-active-members
  • bp-member-widget
  • buddypress-community-stats
  • buddypress-sitemap-generator
  • bp-ninja

This is actually a larger number than I'd feared. Some of these plugins are outdated, but I'm sure they're still in use by many, and it suggests that there are many unreleased plugins/hacks that are doing something similar. It says to me that we should mirror for a version or two, with a big obnoxious _doing_it_wrong() notice for incorrect usage. The deprecation checks would look something like this:

  1. When calling update_user_meta( $user_id, 'last_activity' ), mirror the data to the activity table, and throw a _doing_it_wrong() message
  2. When calling get_user_meta( $user_id, 'last_activity' ), return the data from the activity data instead of usermeta, and throw a _doing_it_wrong() message
  3. In bp_update_user_last_activity(), set the last_activity usermeta.
  4. Filter 'query', and do a search for {$wpdb->usermeta} and 'last_activity'. If found, throw a _doing_it_wrong() message

Number 4 is worrisome because I don't know what kind of overhead it will add. However, it could be important, because many of the plugins listed above are making direct queries that won't get notices any other way.

We could include this stuff for, say, one or two major versions, and then remove it. This, along with a big publicity push on bpdevel.

comment:10 in reply to: ↑ 9 @imath17 months ago

Replying to boonebgorges:

If this didn't work for you, it's a bug in the patch. I have a migration routine already in place.

Oops!! You're right, i've played with it and forgot to reset the raw_db_version!! that's why the routine wasn't running. So i confirm it's working great.

Number 4 is worrisome because I don't know what kind of overhead it will add. However, it could be important, because many of the plugins listed above are making direct queries that won't get notices any other way.

We could include this stuff for, say, one or two major versions, and then remove it. This, along with a big publicity push on bpdevel.

Aïe!wow! meaning every query would be filtered. In this case maybe 1. and 2. are not necessary as they would finished in the $wpdb->query function. And i understand why you worry about this!

Could we warn plugins author instead, like it was the case for version 1.5 and the use of BP_Component ?

comment:11 follow-up: @boonebgorges17 months ago

So i confirm it's working great.

Cool, thanks! We should also make this routine part of a new Tools page in #5296.

Could we warn plugins author instead, like it was the case for version 1.5 and the use of BP_Component ?

I would like to avoid breaking backward compatibility as much as possible, even if it means jumping through hoops. I don't want another 1.5 on our hands (though, to be fair, the extent of any potential breakage with my proposed changes are far, far less extensive).

However, we could also consider a more moderate approach, where we implement suggestions 1-3 but not 4. A glance at the search results in plugins.svn.wordpress.org show that roughly half of the uses of 'last_activity' would be caught in this way. For those that depend on last_activity, as long as the content is mirrored, the remaining plugins would continue to *work*, they just wouldn't get the _doing_it_wrong() notice. That'd at least buy us more time to notify plugin authors. This seems like a fair trade-off to me.

comment:12 in reply to: ↑ 11 @imath17 months ago

Replying to boonebgorges:

However, we could also consider a more moderate approach, where we implement suggestions 1-3 but not 4.

+1 for the more moderate approach :)

comment:13 @boonebgorges17 months ago

  • Keywords 2nd-opinion removed

Agreed, imath. 5108.02.patch implements 1, 2, and 3 as described in https://buddypress.trac.wordpress.org/ticket/5128#comment:9

A big advantage of this technique is that it's no longer necessary to refactor the BP_Core_User queries that reference last_activity usermeta. They'll continue to work as before, because the data will still be there and will continue to be updated.

@boonebgorges17 months ago

comment:14 @ircbot17 months ago

This ticket was mentioned in IRC in #buddypress-dev by boonebgorges. View the logs.

comment:15 @ircbot17 months ago

This ticket was mentioned in IRC in #buddypress-dev by boonebgorges. View the logs.

comment:16 @johnjamesjacoby17 months ago

All good things in this ticket. In total agreement with Boone's thoughts and approach towards doing it wrong and backpat. Filtering 'query' is somewhat drastic, but PHP arrays are fast, and combined with your other performance improvements, something tells me we're still ahead of the game for 2.0.0.

Patch also applies clean and does what it says on the tin. I'd say keep working towards your 1-4's, and let's consider this good for 2.0.0.

comment:17 @johnjamesjacoby17 months ago

Also, maybe we could avoid doing the new-style queries if the database version hasn't been bumped, to prevent situations like math ran into above?

comment:18 @DJPaul17 months ago

Boone, are you proposing to keep last_activity mirrored in usermeta for perpetuity, or just some number of releases? If the latter, we should create a ticket and roadmap it appropriately so we don't forget.

comment:19 @boonebgorges17 months ago

johnjamesjacoby - Many thanks for the feedback. It's not really possible to avoid doing the queries (what would we do as alternatives?). But it should be easy to run a quick test to see whether the migration has in fact taken place, and if not, then to throw an admin_notice that'll warn the user to run the migration tool manually. (That tool does not exist yet, but see #5296.)

DJPaul - Probably not in perpetuity. I'll open a separate ticket. Thanks for the reminder.

comment:20 @DJPaul17 months ago

I haven't altered any of the half-dozen or so methods in BP_Core_User that make direct queries to wp_usermeta for the 'last_activity' data. These methods are no longer used in BuddyPress.

Not knowing which methods we're talking about, but can any of these be deprecated? Do you still have a list, or something to look at some point in the future?

comment:21 @boonebgorges17 months ago

can any of these be deprecated?

I'll have to check whether they're used anywhere in BuddyPress. If not, then yes, they should be deprecated.

I don't have a list, but I will make one when I get the main item sorted out.

comment:22 @boonebgorges17 months ago

In 7860:

Migrate user 'last_activity' data from usermeta to the activity table

Storing last_activity in usermeta caused severe bottlenecks on sites with
large user bases. The usermeta table has a tendency to get bloated. Its
option_value column is not indexed, and even if it were, it would not be
indexed properly for the kind of chronological sorting that BuddyPress was
using it for.

This changeset refactors all core last_activity user functionality, so that
the data is stored in the wp_bp_activity table (even when the activity
component is disabled).

For backward compatibility with plugins that reference last_activity metadata
entries, all last_activity data is retained in wp_usermeta, and new data will
be mirrored there until further notice.

See #5128

comment:23 @boonebgorges17 months ago

#5394 is a reminder to remove the data at some point in the future.

I'm leaving this ticket open while I work out some of the details of the migration tools, especially in conjunction with #5296.

As for the other kinds of 'last_activity' metadata used in BuddyPress, I think we should refrain from making further changes for now. Blog and group last_activity is still stored in the respective meta tables, but those tables are much, much smaller, and there are generally far fewer groups/blogs than user, so the scaling issues are much less significant. Also, a scan through the plugins.svn.wordpress.org directory shows that many more plugins make direct reference to *group* last activity than to *user* last activity, so a good deal more work would need to be done to ensure backward compatibility and to educate developers.

In the upcoming weeks, I'll write a post for bpdevel about the transition, with recommendations for devs.

comment:24 @dcavins16 months ago

Hi boonebgorges-

I'm noticing that since this change, things like the members list on a group's admin members are not showing all of the members. Is that to be expected until the migration piece is in place?

It seems to be the case that members who've created activity since the switch do appear, so that's what I'm guessing.

Thanks!

comment:25 @boonebgorges16 months ago

dcavins - It's probably the case that the migration tool didn't run correctly. This happens sometimes if you upgrade in a non-standard way (via version control, or by manually uploading the new plugin files). We'll be working on a tool for manually triggering the migration, but in the meantime you can roll back your _bp_db_version in the DB, then visit a wp-admin page to force the upgrade.

comment:26 @dcavins16 months ago

Thanks, that did refresh the members directory. However, the members lists within group admin > members is still not being populated. I've reverted to bp-compat (using twenty twelve) and it still seems to be the case.

It could be unrelated. If I figure anything out, I'll add another comment.

comment:27 @boonebgorges16 months ago

Group Admin > Members lists should not be dependent on user last_activity. However, that code has been modified in the last few weeks in trunk, so it's possible that you've either found a bug, or you're out of sync with the bleeding edge. See eg https://buddypress.trac.wordpress.org/log/trunk/bp-groups

comment:28 @dcavins16 months ago

I'm pretty sure I'm up to date (git svn rebase this morning, using master branch). In looking through the list, r7949 is a recent change that might be related. That gives me a place to look. Thanks for the response.

comment:29 @r-a-y16 months ago

In 8047:

Add object caching when fetching a single user's last activity.

Previously, BP_Core_User::get_last_activity() was not being cached.
This resulted in an unnecessary query being run on every page load due
to the usage of bp_core_record_activity() on the 'wp_head' action.

This commit caches calls to BP_Core_User::get_last_activity() when a
single user ID is passed and resets the cache when the user's last
activity is updated on BP_Core_User::update_last_activity().

See #5128

comment:30 @boonebgorges16 months ago

  • Resolution set to fixed
  • Status changed from new to closed

I believe issues here are resolved. The repair tool was introduced in r8124. Marking fixed.

comment:31 @boonebgorges14 months ago

In 8332:

Overhaul caching for user last_activity values

The last_activity methods in the BP_Core_User class were originally designed
without persistent caching in mind. In r8047, some minimal caching support was
introduced, but it was tacked on and incomplete. In particular, it depended
on an inconsistent structure of return values from the last_activity methods.
See #5128 for more background.

This changeset introduces a number of changes that make last_activity methods
perform more consistently, as well as full support for object caching.

  • BP_Core_User::get_last_activity() always returns a multidimensional array of last_activity arrays, keyed by user_id. This remains true even when last_activity data is being fetched only for a single user.
  • last_activity data is stored in the bp_last_activity cache bucket without the user-id-keyed wrapper.
  • BP_Core_User::get_last_activity() now uses the same caching technique as other BuddyPress and WordPress queries: detect which requested items are not yet in the cache, prime the cache for those items, and then fetch all requested values directly from the cache to build a return value.

Fixes #5590

Props imath, johnjamesjacoby for initial patches

Note: See TracTickets for help on using tickets.