Skip to:
Content

BuddyPress.org

Opened 10 years ago

Closed 10 years ago

#5572 closed defect (bug) (fixed)

last_activity migration function is unnecessarily slow

Reported by: boonebgorges's profile boonebgorges Owned by: boonebgorges's profile boonebgorges
Milestone: 2.0.1 Priority: normal
Severity: normal Version:
Component: Core Keywords: commit has-patch
Cc:

Description

bp_last_activity_migrate() uses a NOT IN clause to prevent duplicates. https://buddypress.trac.wordpress.org/browser/tags/2.0/bp-members/bp-members-functions.php#L998 This was mainly added to help with testing of repeated migrations. It turns out that the clause slows down migrations by a large amount on large installations, because the NOT IN subquery returns so many results.

I'm proposing a change that will alter bp_last_activity_migrate() so that uniqueness is enforced by deleting all existing last_activity data in the activity table before migrating the new content over. This should not cause any problems, because BP mirrors the data (so it could even be run later, via the Tools panel).

In my tests, the query speed is dramatically increased.

Posting here for feedback.

Attachments (1)

5572.patch (3.1 KB) - added by boonebgorges 10 years ago.

Download all attachments as: .zip

Change History (10)

@boonebgorges
10 years ago

This ticket was mentioned in IRC in #buddypress-dev by r-a-y. View the logs.


10 years ago

#2 @r-a-y
10 years ago

So I tried playing around with MySQL stored procedures to make the DELETE query scale better.

Here's what I used in the MySQL command-line:

mysql> delimiter //
mysql> CREATE PROCEDURE bp_delete_user_last_activity()
    -> MODIFIES SQL DATA
    -> BEGIN
    -> REPEAT
    -> DELETE FROM wp_bp_activity
    -> WHERE component = 'members' AND type = 'last_activity'
    -> LIMIT 1000;
    -> UNTIL ROW_COUNT() = 0 END REPEAT;
    -> END //
Query OK, 0 rows affected (0.00 sec)

mysql> delimiter ;
mysql> CALL bp_delete_user_last_activity();
Query OK, 0 rows affected (0.55 sec)

0.55 seconds to purge the last activity on a 5000 user install. Not too shabby. (Btw, wp-cli-buddypress kicks ass!)

Then, I tried to translate this over to using $wpdb->query().

Apparently, $wpdb->query() doesn't seem to like stored procedures. A user on StackExchange encountered a similar problem and worked around this by using mysqli:
https://wordpress.stackexchange.com/questions/127344/wpdb-and-mysql-create-trigger

However, this is probably not something we can use since not all setups will have the mysqli extension available.

#3 @boonebgorges
10 years ago

Thanks, r-a-y! But is the DELETE query that slow all by itself?

#4 @DJPaul
10 years ago

Boone: patch looks OK, but two questions:

  • If a user has last_activity in the activity table but not in usermeta. Does this situation exist in any way in core at the moment? If it did, users affected wouldn't show up in member lists etc until that user's logged in and BP's regenerated the key, which seems acceptable to me as a consequence of this (hopefully) once-off migration.
  • We're deleting the records directly in the database. Do we need to clear any caches?

#5 @boonebgorges
10 years ago

Thanks for the feedback, DJPaul.

If a user has last_activity in the activity table but not in usermeta. Does this situation exist in any way in core at the moment?

No. We always mirror it. It would only come up if a plugin directly updates the usermeta row.

If it did, users affected wouldn't show up in member lists etc until that user's logged in and BP's regenerated the key, which seems acceptable to me as a consequence of this (hopefully) once-off migration.

Correct, and I agree that it's an edge case consequence that is not the end of the world.

We're deleting the records directly in the database. Do we need to clear any caches?

It'd slow things down a lot to delete the caches. I don't think we need to, in any case. The usermeta cache for 'last_activity' is only read if a theme or plugin directly calls get_user_meta( $user_id, 'last_activity' ). BP core does not do that anymore - all last activity data is queried from the activity table. So, it's highly unlikely that anyone will ever see the stale cache data, and in any case it will be purged the next time the user in question logs in.

#6 @DJPaul
10 years ago

  • Keywords commit has-patch added

Sounds good.

#7 @r-a-y
10 years ago

Thanks, r-a-y! But is the DELETE query that slow all by itself?

It's 65% slower than the stored procedure, but I guess the DELETE query is trivial compared to the insertion.

There's also a bug in the patch.

In the DELETE query, action = 'last_activity'" should be type = 'last_activity'"

#8 @boonebgorges
10 years ago

There's also a bug in the patch.

You da man :)

#9 @boonebgorges
10 years ago

  • Owner set to boonebgorges
  • Resolution set to fixed
  • Status changed from new to closed

In 8333:

When migrating last_activity data, clear all data to avoid duplicates

Previously, duplicates were avoided with a NOT IN subquery. But this caused
load problems on large installations, where the subquery would return
many thousands of results.

Fixes #5572

Note: See TracTickets for help on using tickets.