Skip to:
Content

BuddyPress.org

Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#5629 closed enhancement (fixed)

Slow queries with BP_Activity_Activity::get()

Reported by: clean-cole's profile Clean-Cole Owned by: boonebgorges's profile boonebgorges
Milestone: 2.1 Priority: normal
Severity: normal Version: 2.0
Component: Activity Keywords: has-patch
Cc:

Description

BP_Activity_Activity::get() uses a count query on line 443 that can be avoided when the max argument is not supplied. The COUNT query slows down larger BP installs.

Attachments (4)

bp-activity-classes.diff (1.6 KB) - added by Clean-Cole 11 years ago.
Patch for Ticket #5629
5629.patch (6.9 KB) - added by boonebgorges 11 years ago.
5629.02.patch (10.0 KB) - added by boonebgorges 11 years ago.
5629.03.patch (9.4 KB) - added by r-a-y 11 years ago.

Download all attachments as: .zip

Change History (18)

@Clean-Cole
11 years ago

Patch for Ticket #5629

#1 @boonebgorges
11 years ago

  • Component changed from Core to Activity
  • Keywords dev-feedback removed
  • Milestone changed from Awaiting Review to 2.1
  • Priority changed from normal to lowest

Thanks for the patch. I had some thoughts about this very issue while working on related query improvements in the Activity component. Ideally, I would like to remove the count query in as many activity queries as possible, regardless of the value of max. Let's take a look at a bit of a broader solution for BP 2.1.

#2 @Clean-Cole
11 years ago

Since larger installs will typically have a persistent cache plugin installed I wonder if using WP Object cache is a good alternative or if having a transient object would work better. I will create each and run some tests and post my findings.

#3 @boonebgorges
11 years ago

Clean-Cole - This seems to me to be a good candidate for the transient cache, because it's such a specific bottleneck. Either way, the tricky part here is not caching the value so much as it is caching it in such a way that it's possible to invalidate it correctly. Ideally, a new activity item will only invalidate the cached count values that are relevant. For example, a new activity item "Boone joined group Foo" should invalidate (1) the sitewide activity count, (2) Boone's personal activity counts, (3) the My Groups activity counts for all members of Foo, (4) all counts for Foo that contain joined_group activity items (and possibly more), but you should not have to invalidate *all* cached counts.

This gets very complicated very quickly. So, at a first pass, you might consider playing with one of the following ideas:

  1. Select a whitelist of counts that get cached. Eg, cache the count for the All filter on the sitewide stream, and maybe the All filters for each individual member, but no other filters. Then you'll have a relatively small list of things to invalidate.
  2. Flush all count-related caches whenever a new activity item is posted. This is technically the simplest solution, but has some obvious downsides.

Thanks for looking ot this.

#4 @Clean-Cole
11 years ago

Hey Boone, I'm with you on that. I think the biggest issue for me trying to come up with a solution is trying to determine the more common use cases of the BP_Activity_Activity::get(). I've started creating a function that will determine if the activity query being run is something that should be cached.

Currently I'm playing around with a foreach statement that checks the following array against the $filter query parameter. This is less of an actual code solution but more of me thinking out loud. Integer keys in the array are only checked by an isset() call. String key => value pairs are checked for a matched value. The positive and negative keys are hopefully self explanatory.

$checks = array(
	'positive' => array( 0 => 'user_id', 'object' => 'activity' ),
	'negative' => array( 'since', 'action', 'primary_id', 'secondary_id' ),
);

So I figured any queries using since, action, or primary/secondary id should be excluded from a transient. So basically an all activity count for a user is the only thing we can really cache reliably? Another issue I considered was that if we save a user's total activity in a transient, the options table could easily get too bloated. Do you think storing a user's total activity count inside BuddyPress user meta is a viable alternative?

I must admit I'm not familiar enough with this query to truly know what should be included or excluded, so any input or thoughts you have would obviously be appreciated.

#5 @boonebgorges
11 years ago

So I figured any queries using since, action, or primary/secondary id should be excluded from a transient. So basically an all activity count for a user is the only thing we can really cache reliably?

I guess I'd also consider group counts ('component' => 'groups', 'item_id' => $group_id ).

? Another issue I considered was that if we save a user's total activity in a transient, the options table could easily get too bloated. Do you think storing a user's total activity count inside BuddyPress user meta is a viable alternative?

I definitely would *not* use usermeta. At least in the case of transients, the WP API is smart enough only to grab the values that it needs when it needs them, and they're stored in memory when a persistent cache is available.

That said, the more I think about this, the more I think that any kind of caching is bound to be problematic. What we're considering here is caching many, many very small values (integer counts). This causes problems on various caching schemas, due to data fragmentetation and internal invalidation logic. On a site with many users, we could be talking about hundreds of thousands of additional values to cache.

Given these considerations, I think the correct thing to do is probably to use wp_cache_ for these values. That way BP doesn't have to make guesses about how the specific caching backend is going to work, and optimization can be left up to site admins.

#6 @boonebgorges
11 years ago

5629.patch is a first pass at removing total counts from activity queries by default.

There are a few main motivations at work here:

  1. Despite some big improvements in the efficiency of activity queries, COUNT queries are still slow. At scale, it's unlikely that we're going to be able to improve them much more - it's an unavoidable consequence of the sheer size of the tables.
  1. The 'total' value from the activity query is actually used very sparingly in BuddyPress, and my guess is that we can put a few workarounds in place to remove it altogether for the majority of uses.

A few notes about the patch:

  1. One place where we do generally use the total count is in bp_activity_has_more_items(). My workaround is to do a little trick: if the per_page value is 20, query instead for 21 activity ids. If your results have more than 20 items, then has_more_items is true. I pass that value down through the template class, and use it in bp_activity_has_more_items() if it's available.
  2. The new 'count_total' parameter for BP_Activity_Activity::get() is modeled after the same param in BP_User_Query. At the moment, if you pass false for activity, no query is done, while any other value will perform a separate COUNT query. But we could implement sometihng like sql_calc_found_rows if we thought that'd be helpful as well.

The major problem here is noscript support. When javascript is disabled, we put up traditional pagination markup. But to build that pagination markup, we need total counts. The problem is that, when determining the default arguments for bp_has_activities(), there's no way to know whether javascript is turned off. A couple ideas: put an entirely separate activity loop into <noscript> tags; replace regular noscript pagination with a 'Load More' button that simply loads 'p=2'.

In any case, this change will result in a huge increase in efficiency on many setups, so I think it's worth pursuing (possibly in connection with better count caching, as described above). Thoughts welcome.

@boonebgorges
11 years ago

#7 @r-a-y
11 years ago

I'm liking the approach in this patch!

I'm not really worried about <noscript>. Twitter and Facebook heavily rely on javascript and the "Load More" functionality of those sites also do not work without JS.

I like this approach for <noscript>:

replace regular noscript pagination with a 'Load More' button that simply loads 'p=2'

We can increment the p variable infinitely with <noscript>. Who cares when it actually ends? At least, BP will have some paginated functionality, which will beat Twitter and FB!

Last edited 11 years ago by r-a-y (previous) (diff)

#8 @boonebgorges
11 years ago

We can increment the p variable infinitely with <noscript>. Who cares when it actually ends? At least, BP will have some paginated functionality, which will beat Twitter and FB!

Agreed. By p=2 I of course meant p=n+1. What we lose here is no way to jump around in pagination - like to go back to the first page - but it seems to me that this is not a huge loss, as you note!

I'll work on a revised patch.

This ticket was mentioned in IRC in #buddypress-dev by boonebgorges. View the logs.


11 years ago

#10 @boonebgorges
11 years ago

  • Priority changed from lowest to normal

5629.02.patch is an update with the following changes:

  • In the activity-loop.php template, I've removed the <noscript> block containing activity pagination (since it no longer works correctly anyway)
  • Made some minor mods to BP_Activity_Template so that the 'acpage' param is available in the template loop global
  • Switched the href of the Load More link from '#more' (which was a dummy link that did nothing) to http://example.com/activity/?acpage=x+1. This provides noscript support for browsing through pages

I think this is a pretty elegant solution, but there are a few backpat concerns. Any theme that has overridden activity-loop.php will not inherit the change. And any bp-default derivative theme will not get the change. In both of these cases, the Load More link won't work (though this is not a regression), and the non-functional <noscript> pagination block will still be there. My initial thought is that this is an acceptable side-effect of the change - remember that we're only talking about users without Javascript, which will be a small percentage of the users of the small percentage of affected themes. We can publish some documentation about making the necessary manual template changes around the time of 2.1.

Do others have thoughts about this backward compatibility issue?

#11 @r-a-y
11 years ago

Nice work, Boone!

03.patch:

  • Fixes the "Load More" link so it can be used on other activity pages (user activity, group activity). Instead of the activity directory page, I'm using bp_get_requested_url().
  • Added some phpDoc for the count_total parameter. I see that you're using this parameter as a string ('count_query') in the activity admin area. If there's a specific reason you're using a string, feel free to amend the phpDoc.

As for the backpat issue, it's very, very minor and is not worth thinking about ;)

@r-a-y
11 years ago

#12 @boonebgorges
11 years ago

r-a-y - Many thanks for having another look.

Instead of the activity directory page, I'm using bp_get_requested_url().

Perfect. Good catch.

If there's a specific reason you're using a string, feel free to amend the phpDoc.

For parity with BP_User_Query. https://buddypress.trac.wordpress.org/browser/tags/2.0.1/bp-core/bp-core-classes.php#L50

I think this is in a good spot. I'm going to commit it and then start working on some information for developers for 2.1.

#13 @boonebgorges
11 years ago

  • Owner set to boonebgorges
  • Resolution set to fixed
  • Status changed from new to closed

In 8491:

Disable COUNT queries by default in bp_activity_get() stack

The COUNT query performed to get the total activity count has proven to be a
performance bottleneck on large sites. Moreover, the way that activity items
are displayed on the front end (using the Load More link) means that the total
count is not actually used by BuddyPress in most cases. To reduce query
overhead, we introduce a 'count_total' parameter to the bp_activity_get()
stack, and set it to false by default.

For backward compatibility, a few additional changes are introduced:

  • In the activity-loop.php template, the <noscript> pagination markup is removed. In its place, the Load More link is refactored so that it loads the next available page of activity items.
  • The mechanism used to determine whether there are more activity items to show (bp_activity_has_more_items()) has been refined; when no COUNT query takes place, we query for the $per_page value + 1 to infer whether more items are available.

Fixes #5629

Props boonebgorges, r-a-y

#14 @boonebgorges
11 years ago

I've marked this ticket as resolved, as COUNT query is disabled by default as of r8491. If we still want to explore techniques for caching the COUNT when it does take place, let's do it in a separate ticket. Thanks for your help, all.

Note: See TracTickets for help on using tickets.