#5629 closed enhancement (fixed)
Slow queries with BP_Activity_Activity::get()
Reported by: | Clean-Cole | Owned by: | boonebgorges |
---|---|---|---|
Milestone: | 2.1 | Priority: | normal |
Severity: | normal | Version: | 2.0 |
Component: | Activity | Keywords: | has-patch |
Cc: |
Description
BP_Activity_Activity::get() uses a count query on line 443 that can be avoided when the max
argument is not supplied. The COUNT
query slows down larger BP installs.
Attachments (4)
Change History (18)
#1
@
11 years ago
- Component changed from Core to Activity
- Keywords dev-feedback removed
- Milestone changed from Awaiting Review to 2.1
- Priority changed from normal to lowest
Thanks for the patch. I had some thoughts about this very issue while working on related query improvements in the Activity component. Ideally, I would like to remove the count query in as many activity queries as possible, regardless of the value of max
. Let's take a look at a bit of a broader solution for BP 2.1.
#2
@
11 years ago
Since larger installs will typically have a persistent cache plugin installed I wonder if using WP Object cache is a good alternative or if having a transient object would work better. I will create each and run some tests and post my findings.
#3
@
11 years ago
Clean-Cole - This seems to me to be a good candidate for the transient cache, because it's such a specific bottleneck. Either way, the tricky part here is not caching the value so much as it is caching it in such a way that it's possible to invalidate it correctly. Ideally, a new activity item will only invalidate the cached count values that are relevant. For example, a new activity item "Boone joined group Foo" should invalidate (1) the sitewide activity count, (2) Boone's personal activity counts, (3) the My Groups activity counts for all members of Foo, (4) all counts for Foo that contain joined_group activity items (and possibly more), but you should not have to invalidate *all* cached counts.
This gets very complicated very quickly. So, at a first pass, you might consider playing with one of the following ideas:
- Select a whitelist of counts that get cached. Eg, cache the count for the All filter on the sitewide stream, and maybe the All filters for each individual member, but no other filters. Then you'll have a relatively small list of things to invalidate.
- Flush all count-related caches whenever a new activity item is posted. This is technically the simplest solution, but has some obvious downsides.
Thanks for looking ot this.
#4
@
11 years ago
Hey Boone, I'm with you on that. I think the biggest issue for me trying to come up with a solution is trying to determine the more common use cases of the BP_Activity_Activity::get(). I've started creating a function that will determine if the activity query being run is something that should be cached.
Currently I'm playing around with a foreach statement that checks the following array against the $filter
query parameter. This is less of an actual code solution but more of me thinking out loud. Integer keys in the array are only checked by an isset()
call. String key => value pairs are checked for a matched value. The positive and negative keys are hopefully self explanatory.
$checks = array( 'positive' => array( 0 => 'user_id', 'object' => 'activity' ), 'negative' => array( 'since', 'action', 'primary_id', 'secondary_id' ), );
So I figured any queries using since
, action
, or primary
/secondary
id should be excluded from a transient. So basically an all activity count for a user is the only thing we can really cache reliably? Another issue I considered was that if we save a user's total activity in a transient, the options table could easily get too bloated. Do you think storing a user's total activity count inside BuddyPress user meta is a viable alternative?
I must admit I'm not familiar enough with this query to truly know what should be included or excluded, so any input or thoughts you have would obviously be appreciated.
#5
@
11 years ago
So I figured any queries using since, action, or primary/secondary id should be excluded from a transient. So basically an all activity count for a user is the only thing we can really cache reliably?
I guess I'd also consider group counts ('component' => 'groups', 'item_id' => $group_id ).
? Another issue I considered was that if we save a user's total activity in a transient, the options table could easily get too bloated. Do you think storing a user's total activity count inside BuddyPress user meta is a viable alternative?
I definitely would *not* use usermeta. At least in the case of transients, the WP API is smart enough only to grab the values that it needs when it needs them, and they're stored in memory when a persistent cache is available.
That said, the more I think about this, the more I think that any kind of caching is bound to be problematic. What we're considering here is caching many, many very small values (integer counts). This causes problems on various caching schemas, due to data fragmentetation and internal invalidation logic. On a site with many users, we could be talking about hundreds of thousands of additional values to cache.
Given these considerations, I think the correct thing to do is probably to use wp_cache_
for these values. That way BP doesn't have to make guesses about how the specific caching backend is going to work, and optimization can be left up to site admins.
#6
@
11 years ago
5629.patch is a first pass at removing total counts from activity queries by default.
There are a few main motivations at work here:
- Despite some big improvements in the efficiency of activity queries, COUNT queries are still slow. At scale, it's unlikely that we're going to be able to improve them much more - it's an unavoidable consequence of the sheer size of the tables.
- The 'total' value from the activity query is actually used very sparingly in BuddyPress, and my guess is that we can put a few workarounds in place to remove it altogether for the majority of uses.
A few notes about the patch:
- One place where we do generally use the total count is in
bp_activity_has_more_items()
. My workaround is to do a little trick: if the per_page value is 20, query instead for 21 activity ids. If your results have more than 20 items, thenhas_more_items
is true. I pass that value down through the template class, and use it inbp_activity_has_more_items()
if it's available. - The new
'count_total'
parameter forBP_Activity_Activity::get()
is modeled after the same param inBP_User_Query
. At the moment, if you pass false for activity, no query is done, while any other value will perform a separate COUNT query. But we could implement sometihng likesql_calc_found_rows
if we thought that'd be helpful as well.
The major problem here is noscript support. When javascript is disabled, we put up traditional pagination markup. But to build that pagination markup, we need total counts. The problem is that, when determining the default arguments for bp_has_activities()
, there's no way to know whether javascript is turned off. A couple ideas: put an entirely separate activity loop into <noscript> tags; replace regular noscript pagination with a 'Load More' button that simply loads 'p=2'.
In any case, this change will result in a huge increase in efficiency on many setups, so I think it's worth pursuing (possibly in connection with better count caching, as described above). Thoughts welcome.
#7
@
11 years ago
I'm liking the approach in this patch!
I'm not really worried about <noscript>
. Twitter and Facebook heavily rely on javascript and the "Load More" functionality of those sites also do not work without JS.
I like this approach for <noscript>
:
replace regular noscript pagination with a 'Load More' button that simply loads 'p=2'
We can increment the p
variable infinitely with <noscript>
. Who cares when it actually ends? At least, BP will have some paginated functionality, which will beat Twitter and FB!
#8
@
11 years ago
We can increment the p variable infinitely with <noscript>. Who cares when it actually ends? At least, BP will have some paginated functionality, which will beat Twitter and FB!
Agreed. By p=2
I of course meant p=n+1
. What we lose here is no way to jump around in pagination - like to go back to the first page - but it seems to me that this is not a huge loss, as you note!
I'll work on a revised patch.
This ticket was mentioned in IRC in #buddypress-dev by boonebgorges. View the logs.
11 years ago
#10
@
11 years ago
- Priority changed from lowest to normal
5629.02.patch is an update with the following changes:
- In the
activity-loop.php
template, I've removed the<noscript>
block containing activity pagination (since it no longer works correctly anyway) - Made some minor mods to
BP_Activity_Template
so that the'acpage'
param is available in the template loop global - Switched the href of the Load More link from '#more' (which was a dummy link that did nothing) to
http://example.com/activity/?acpage=x+1
. This provides noscript support for browsing through pages
I think this is a pretty elegant solution, but there are a few backpat concerns. Any theme that has overridden activity-loop.php will not inherit the change. And any bp-default derivative theme will not get the change. In both of these cases, the Load More link won't work (though this is not a regression), and the non-functional <noscript>
pagination block will still be there. My initial thought is that this is an acceptable side-effect of the change - remember that we're only talking about users without Javascript, which will be a small percentage of the users of the small percentage of affected themes. We can publish some documentation about making the necessary manual template changes around the time of 2.1.
Do others have thoughts about this backward compatibility issue?
#11
@
11 years ago
Nice work, Boone!
03.patch:
- Fixes the "Load More" link so it can be used on other activity pages (user activity, group activity). Instead of the activity directory page, I'm using
bp_get_requested_url()
. - Added some phpDoc for the
count_total
parameter. I see that you're using this parameter as a string ('count_query'
) in the activity admin area. If there's a specific reason you're using a string, feel free to amend the phpDoc.
As for the backpat issue, it's very, very minor and is not worth thinking about ;)
#12
@
11 years ago
r-a-y - Many thanks for having another look.
Instead of the activity directory page, I'm using bp_get_requested_url().
Perfect. Good catch.
If there's a specific reason you're using a string, feel free to amend the phpDoc.
For parity with BP_User_Query
. https://buddypress.trac.wordpress.org/browser/tags/2.0.1/bp-core/bp-core-classes.php#L50
I think this is in a good spot. I'm going to commit it and then start working on some information for developers for 2.1.
Patch for Ticket #5629