Skip to:
Content

Opened 3 months ago

Closed 3 months ago

Last modified 2 months ago

#5349 closed task (fixed)

Activity query refactor for improved performance

Reported by: boonebgorges Owned by: johnjamesjacoby
Milestone: 2.0 Priority: normal
Severity: normal Version:
Component: Activity Keywords: has-patch needs-testing
Cc: joesell89@…

Description

On large BuddyPress sites, the activity table can grow very large very quickly, and the corresponding queries can be problematically slow. (See #4045, #4060, #5210 for some background.)

With some of the recent improvements we've made to member queries (our previous bugaboo), I think we're well-poised to rethink how activity queries work.

My proposed strategy is to split the query. Our current activity queries fetch entire rows, and includes an awful join against the global users table. When combined with DISTINCT and ORDER BY keywords, this behavior introduces serious scaling problems. By splitting into two queries - one to get a list of matching activity IDs, and another to get all related info - we can improve performance dramatically. See https://core.trac.wordpress.org/ticket/18536 for a similar discussion and solution implemented in WP_Query in WP 3.4.

A very significant side effect of a split query is that it leads naturally to a split strategy with respect to activity caching: caching for complex queries as a separate process from caching single activity items. We currently do basically zero activity caching (I have some ideas that I'll put in another ticket, which I'll link from this one), but the change proposed here will begin the process.

The most serious concern about this change is that we'll probably break plugins that are directly filtering the SQL queries. We can lessen the blow by (a) doing this early in a dev cycle and warning devs about the change, and (b) making sure there's a filter that allows a site to use legacy queries/filters. We've done this for BP_User_Query and I've heard very few complaints.

I'm putting this out for discussion right now. I'm planning to work on some patches and benchmarks for early in the 2.0 cycle. Preliminary feedback welcome at any time.

Attachments (1)

5349.patch (4.8 KB) - added by boonebgorges 3 months ago.

Download all attachments as: .zip

Change History (13)

comment:1 boonebgorges3 months ago

Started a ticket for discussing activity caching at #5349.

comment:2 ircbot3 months ago

This ticket was mentioned in IRC in #buddypress-dev by boonebgorges. View the logs.

comment:3 boonebgorges3 months ago

  • Keywords has-patch needs-testing added

It turned out that this strategy was easier to implement than I'd anticipated, and had a bigger impact on performance than I'd anticipated. See 5349.patch.

First: implementation details.

  1. 'bp_use_legacy_activity_query' filter allows you to continue to use the old query format
  2. To fetch user data, I'm using BP_User_Query instead of a direct query to the users table. The performance differences are minimal. My technique will allow us to implement caching in BP_User_Query and have it trickle down.
  3. Contrary to what I say in the ticket description above, I didn't have to break backward compatibility with any filters. Hooray!
  4. Unit tests are passing.

Second: performance implications. I've got a test database with 153 user accounts and 105558 activity items.

Legacy query:

SELECT DISTINCT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name FROM 
wp_bp_activity a LEFT JOIN wp_users u ON a.user_id = u.ID WHERE a.is_spam = 0 AND 
a.hide_sitewide = 0 AND a.type != 'activity_comment' ORDER BY a.date_recorded DESC LIMIT 0, 20 
(1,096.2ms)

New, split query:

SELECT DISTINCT a.id FROM wp_bp_activity a WHERE a.is_spam = 0 AND a.hide_sitewide = 0 AND 
a.type != 'activity_comment' ORDER BY a.date_recorded DESC LIMIT 0, 20
(0.3ms)

SELECT * FROM wp_bp_activity WHERE id IN 
(105616,105615,105614,105613,105612,105611,105610,105609,105608,105607,105606,105605,
105604,105603,105602,105601,105600,105599,105598,105597) ORDER BY FIELD( id, 
105616,105615,105614,105613,105612,105611,105610,105609,105608,105607,105606,105605,
105604,105603,105602,105601,105600,105599,105598,105597 )
(0.4ms)

SELECT wp_users.ID,wp_users.user_registered,wp_users.user_login,wp_users.user_nicename,
wp_users.display_name,wp_users.user_email FROM wp_users WHERE 1=1 AND wp_users.ID IN 
(119,40,29,23,95,80,22,66,69,44,42,148,52,26,97,63,37,125,58) ORDER BY user_login ASC
 (0.3ms)

As you can see, we've gone from 1 query that takes over a second to 3 queries that take just one millisecond. A 1000x improvement. (And note that this does not take into account any to-be-built caching mechanisms. With proper caching, the second and third queries could be cut down or avoided altogether in many cases.)

Feedback welcome. Since this doesn't break any filters, I don't think there are any backward compatibility concerns, and I think we should commit right away to get some testing done.

boonebgorges3 months ago

comment:4 boonebgorges3 months ago

(Side note - you can use wp-cli-buddypress http://github.com/boonebgorges/wp-cli-buddypress/ to generate activity data and members for testing.)

comment:5 lenasterg3 months ago

Hi.
I just want to confirm improved performance by using the patch in an active site with more than 25000 users and > 150.000 records on bp_activity table.
Tested with Buddypress version 1.8.1 and wp 3.5.1
Thanks

comment:6 johnjamesjacoby3 months ago

  • Owner set to johnjamesjacoby
  • Resolution set to fixed
  • Status changed from new to closed

In 7765:

Split primary Activity query up into separate, more performant queries. Introduces BP_Activity_Activity::get_activity_data() method to help with converting data for template output. Props boonebgorges. Fixes #5349.

comment:7 ircbot3 months ago

This ticket was mentioned in IRC in #buddypress-dev by jjj. View the logs.

comment:8 joesell892 months ago

This is really great. Performs much better and with no issues so far.

I have applied this on a live site with 4,000+ users and ~22,000 rows in the BP_activity table.

WP 3.8.1
BP 1.9.2

Please let me know if there is anything specific you would like me to test.

P.S. It would be really awesome if the same could be done for the slow queries to bp-core-classes.php:434 and bp-activity-classes.php:414

comment:9 joesell892 months ago

  • Cc joesell89@… added

comment:10 boonebgorges2 months ago

joesell89 - Thanks so much for testing and for the feedback!

It would be really awesome if the same could be done for the slow queries to bp-core-classes.php:434 and bp-activity-classes.php:414

These are count queries, so they work differently, and aren't succeptible to the same kinds of fixes. But we will continue to look into improvements in this area too.

comment:11 aaclayton2 months ago

Hey guys, excellent work with this. I should have posted something sooner, but I've been using this changeset on my production site (http://tamrielfoundry.com) for about 2 weeks and I've seen a substantial improvement in query performance. Those old activity queries were crawling, and this is very very much a change for the better:

WP 3.8.1
BP 1.9.2
19.2k users
250,000 activities

Thanks so much for all the excellent work you guys are doing, you rock!

comment:12 boonebgorges2 months ago

Thanks for the feedback, aaclayton!

Note: See TracTickets for help on using tickets.