#5349 closed enhancement (fixed)
Activity query refactor for improved performance
Reported by: | boonebgorges | Owned by: | johnjamesjacoby |
---|---|---|---|
Milestone: | 2.0 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Activity | Keywords: | has-patch needs-testing |
Cc: | joesell89@… |
Description
On large BuddyPress sites, the activity table can grow very large very quickly, and the corresponding queries can be problematically slow. (See #4045, #4060, #5210 for some background.)
With some of the recent improvements we've made to member queries (our previous bugaboo), I think we're well-poised to rethink how activity queries work.
My proposed strategy is to split the query. Our current activity queries fetch entire rows, and includes an awful join against the global users table. When combined with DISTINCT and ORDER BY keywords, this behavior introduces serious scaling problems. By splitting into two queries - one to get a list of matching activity IDs, and another to get all related info - we can improve performance dramatically. See https://core.trac.wordpress.org/ticket/18536 for a similar discussion and solution implemented in WP_Query in WP 3.4.
A very significant side effect of a split query is that it leads naturally to a split strategy with respect to activity caching: caching for complex queries as a separate process from caching single activity items. We currently do basically zero activity caching (I have some ideas that I'll put in another ticket, which I'll link from this one), but the change proposed here will begin the process.
The most serious concern about this change is that we'll probably break plugins that are directly filtering the SQL queries. We can lessen the blow by (a) doing this early in a dev cycle and warning devs about the change, and (b) making sure there's a filter that allows a site to use legacy queries/filters. We've done this for BP_User_Query
and I've heard very few complaints.
I'm putting this out for discussion right now. I'm planning to work on some patches and benchmarks for early in the 2.0 cycle. Preliminary feedback welcome at any time.
Attachments (1)
Change History (15)
This ticket was mentioned in IRC in #buddypress-dev by boonebgorges. View the logs.
11 years ago
#3
@
11 years ago
- Keywords has-patch needs-testing added
It turned out that this strategy was easier to implement than I'd anticipated, and had a bigger impact on performance than I'd anticipated. See 5349.patch.
First: implementation details.
- 'bp_use_legacy_activity_query' filter allows you to continue to use the old query format
- To fetch user data, I'm using
BP_User_Query
instead of a direct query to the users table. The performance differences are minimal. My technique will allow us to implement caching in BP_User_Query and have it trickle down. - Contrary to what I say in the ticket description above, I didn't have to break backward compatibility with any filters. Hooray!
- Unit tests are passing.
Second: performance implications. I've got a test database with 153 user accounts and 105558 activity items.
Legacy query:
SELECT DISTINCT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name FROM wp_bp_activity a LEFT JOIN wp_users u ON a.user_id = u.ID WHERE a.is_spam = 0 AND a.hide_sitewide = 0 AND a.type != 'activity_comment' ORDER BY a.date_recorded DESC LIMIT 0, 20 (1,096.2ms)
New, split query:
SELECT DISTINCT a.id FROM wp_bp_activity a WHERE a.is_spam = 0 AND a.hide_sitewide = 0 AND a.type != 'activity_comment' ORDER BY a.date_recorded DESC LIMIT 0, 20 (0.3ms) SELECT * FROM wp_bp_activity WHERE id IN (105616,105615,105614,105613,105612,105611,105610,105609,105608,105607,105606,105605, 105604,105603,105602,105601,105600,105599,105598,105597) ORDER BY FIELD( id, 105616,105615,105614,105613,105612,105611,105610,105609,105608,105607,105606,105605, 105604,105603,105602,105601,105600,105599,105598,105597 ) (0.4ms) SELECT wp_users.ID,wp_users.user_registered,wp_users.user_login,wp_users.user_nicename, wp_users.display_name,wp_users.user_email FROM wp_users WHERE 1=1 AND wp_users.ID IN (119,40,29,23,95,80,22,66,69,44,42,148,52,26,97,63,37,125,58) ORDER BY user_login ASC (0.3ms)
As you can see, we've gone from 1 query that takes over a second to 3 queries that take just one millisecond. A 1000x improvement. (And note that this does not take into account any to-be-built caching mechanisms. With proper caching, the second and third queries could be cut down or avoided altogether in many cases.)
Feedback welcome. Since this doesn't break any filters, I don't think there are any backward compatibility concerns, and I think we should commit right away to get some testing done.
#4
@
11 years ago
(Side note - you can use wp-cli-buddypress http://github.com/boonebgorges/wp-cli-buddypress/ to generate activity data and members for testing.)
#5
@
11 years ago
Hi.
I just want to confirm improved performance by using the patch in an active site with more than 25000 users and > 150.000 records on bp_activity table.
Tested with Buddypress version 1.8.1 and wp 3.5.1
Thanks
#6
@
11 years ago
- Owner set to johnjamesjacoby
- Resolution set to fixed
- Status changed from new to closed
In 7765:
This ticket was mentioned in IRC in #buddypress-dev by jjj. View the logs.
11 years ago
#8
@
11 years ago
This is really great. Performs much better and with no issues so far.
I have applied this on a live site with 4,000+ users and ~22,000 rows in the BP_activity table.
WP 3.8.1
BP 1.9.2
Please let me know if there is anything specific you would like me to test.
P.S. It would be really awesome if the same could be done for the slow queries to bp-core-classes.php:434 and bp-activity-classes.php:414
#10
@
11 years ago
joesell89 - Thanks so much for testing and for the feedback!
It would be really awesome if the same could be done for the slow queries to bp-core-classes.php:434 and bp-activity-classes.php:414
These are count queries, so they work differently, and aren't succeptible to the same kinds of fixes. But we will continue to look into improvements in this area too.
#11
@
11 years ago
Hey guys, excellent work with this. I should have posted something sooner, but I've been using this changeset on my production site (http://tamrielfoundry.com) for about 2 weeks and I've seen a substantial improvement in query performance. Those old activity queries were crawling, and this is very very much a change for the better:
WP 3.8.1
BP 1.9.2
19.2k users
250,000 activities
Thanks so much for all the excellent work you guys are doing, you rock!
Started a ticket for discussing activity caching at #5349.