#6327 closed defect (bug) (fixed)
Improved caching for group membership
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 2.6 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Groups | Keywords: | |
Cc: |
Description
There are two broad scenarios where you may want to query for group memberships: (a) when you want to know which members a group has, and (b) when you want to know which groups a member belongs to. The query and caching strategies for the two scenarios are different, and in this ticket I'm talking only about (b).
There are many weird, ad hoc ways that we query for user-group memberships in BP. groups_is_user_member()
etc do direct SQL queries. bp_has_groups()
with a 'user_id' param does a JOIN against the membership table. groups_get_user_groups()
does another type of direct SQL query. groups_total_groups_for_user()
does yet another SQL query. And so forth.
With the exception of groups_total_groups_for_user()
, none of these queries are cached. I've run into this problem recently when building a plugin that pulls up a list of groups and does membership checks for the groups, sometimes for different users at different times on the page. The obvious answer - groups_is_user_member()
- results in a separate SQL query for each call, which can add up to hundreds on a single page load.
After thinking about this for a while, I'd like to recommend the following strategy. Let's do maximal caching of group memberships, on a per-user basis. That is, we'll cache the results of SELECT * FROM $bp->groups->table_name_members WHERE user_id = 12345
. Then provide a utility function for getting this information - filtering it based on is_mod
, is_confirmed
, sorting - all of which will be done in PHP, on the cached data, rather than in a SQL query. Then, start to refactor a bunch of these functions to use the central utility function.
I'm working on patch for this and will post it soon. In the meantime, thoughts are welcome, and I'll use this ticket to commit some preliminary unit tests.
Attachments (5)
Change History (29)
#3
@
10 years ago
Some of this makes me worry a bit about how the new wrapper function would be used, but that's probably just due to not fully comprehending what you have in mind. I look forward to seeing your patch.
#4
@
10 years ago
- Keywords has-patch added
6327.patch demonstrates what I have in mind. It does the following:
- Introduces
bp_get_user_groups()
. This function fetches data about a user's group memberships in a cache-friendly way, and supports some parameters for filtering the return results. - Does the necessary cache invalidation. I added a few new hooks to
BP_Groups_Member
to support this. - Refactors the
groups_is_user_*()
functions to use the new function. I also added a couple of missing functions:groups_is_user_invited()
,groups_is_user_banned()
. These could be rewritten as a wrapper forgroups_is_user_status()
or something like that. - Refactors
BP_Groups_Group::get_group_extras()
to use the new function. In this case, we save not only some database overhead, but we also save a bunch of ugly complexity :) - Unit tests for the lot.
This all works very well, and cuts down on database queries significantly, especially when extending with certain types of plugins.
A couple of implementation decisions I made in the patch, which others might want to comment on:
- Cache granularity. I've chosen to cache an array of group membership objects in
'user_groups_' . $user_id
(bucket 'bp_groups'). It would be slightly more complicated, though maybe better in the long run (in terms of reusable cache objects), to cache only the membership IDs inuser_groups_$user_id
, and then to cache membership objects individually. This is something that could be changed later on, of course. - The syntax of
bp_get_user_groups()
works like this: theis_*
parameters have default boolean values, and you can passnull
to any of them to disable the filters (ie, to return results that have eithertrue
orfalse
for these fields). We don't really use this syntax anywhere else (we and WP tend to prefer a string like'any'
), but I thought I'd throw it out there, just for laughs. - I've only chosen a couple of places (
groups_is_user_*()
andget_group_extras()
) to use the new system, though there are a couple other places where it could doubtless be used as well.
Feedback welcome.
This ticket was mentioned in Slack in #buddypress by boone. View the logs.
10 years ago
#6
@
10 years ago
Looks pretty good. I have not looked at the unit tests (yet).
The new actions in class-bp-groups-member.php ought to be committed separately (as I'm sure you know/were going to do).
I agree caching IDs and the objects individually is probably better, and it feels like it's worth the time investment to get it right:
- I think it would match what we do in other parts of BP/WP? which seems to work well?
- Storing "many" objects in one cache item could end up with a very large item in not unrealistic edge cases. Caching backends don't usually handle these well, or keep them cached for long due to the size.
I don't quite understand enough why, for this, we would want to select everything and do the sorting in PHP, and everywhere else, do it across a few SQL queries. Would the "cache IDs and individual objects" approach mean changing this and doing in SQL?
Not too bothered about argument type in bp_get_user_groups
but would conservatively prefer something more consistent with our other functions, unless there's an angle I'm not seeing.
#8
@
10 years ago
Thank you very much for the feedback, DJPaul.
The new actions in class-bp-groups-member.php ought to be committed separately (as I'm sure you know/were going to do).
Done. r9666.
I agree caching IDs and the objects individually is probably better, and it feels like it's worth the time investment to get it right.
On reflection, I agree. 6327.2.patch makes the necessary changes. I've made further changes to the caching schema to match BP/WP's general caching strategies even better (see the new global cache groups: 'bp_groups_memberships_for_user' (caches arrays of membership IDs on a per-user basis), and 'bp_groups_memberships' (caches membership objects on a per-membership-ID basis).
I don't quite understand enough why, for this, we would want to select everything and do the sorting in PHP, and everywhere else, do it across a few SQL queries. Would the "cache IDs and individual objects" approach mean changing this and doing in SQL?
Doing it in a few SQL queries is much easier than getting the caching right :) Now that 2.patch implements a split cache strategy (cache the IDs, then cache each individual object), we could theoretically choose not to cache IDs at all, but that would mitigate most of the benefit of the caching. So, if we are going to cache the ID query, we have to consider the fact that there are different ways that the items might be requested, based on the parameters. There are two general strategies for this:
- Cache the maximal query, and then do sorting/filtering in PHP. That's what I'm doing in this patch.
- Generate a cache key based on the parameters, and then cache the IDs for that combination of parameters. This is closer to what WP does in, say,
get_terms()
: https://core.trac.wordpress.org/browser/tags/4.1.1/src/wp-includes/taxonomy.php#L1672.
(b) is nice because it offloads all of the work to MySQL, which is faster at filtering/sorting than PHP, especially when the number of items to sort/filter becomes very large.
However, (b) pollutes the cache in a pretty severe way. Every possible combination of parameter values will result in a separate cache entry. In the case of bp_get_user_groups()
, there are 7 parameters, each of which has at least 2 possible values. So that's more than 27 possible cache entries *for each user_id*. In practice, of course, there won't be nearly this amount of cache pollution, but it's still significant. Perhaps more important is that strategy (b) means that changing a parameter even slightly results in a cache miss, which mitigates a good deal of the benefit of caching in the first place.
Strategy (a), on the other hand, results in a single cache key for each user, and guarantees more cache hits. The downside is reproducing SQL-type logic for filtering/sorting in PHP, but given the fairly small number of paramaters and the fairly small data sets we're working with here (people are generally not members of more than a few dozen groups), I think it's a good tradeoff.
#10
@
10 years ago
- Milestone changed from 2.3 to 2.4
dcavins and I have had a number of discussions about this. Given that invitations are going to be refactored and differently stored in 2.4, it's not going to make sense for bp_get_user_groups()
to fetch invitations and pending memberships - it will be kind of a backward compatibility nightmare. So let's hold off on this until the other refactoring is done.
This ticket was mentioned in Slack in #buddypress by boone. View the logs.
9 years ago
#16
@
9 years ago
@boonebgorges This is excellent; the gains on group directories and activity streams are impressive (on my test site, I trimmed a page load from 655Q to 155Q).
I think this should go in sooner rather than later if you're still comfortable with it, because it won't make the invitations work any harder. The new invitations info + BP_Groups_Member
interaction is where I've gotten into sticky situations (https://buddypress.trac.wordpress.org/ticket/6210), and the changes proposed in this ticket would be easier to work with/no increase in difficulty over the existing code. I think, anyway.
I've refreshed 6327.2.patch and will attach it.
This ticket was mentioned in Slack in #buddypress by mercime. View the logs.
9 years ago
#18
@
9 years ago
- Milestone changed from Future Release to 2.6
6327.3.patch fixes the patch so that the tests patch. There was an error in one of the tests (_invite_delete
), and some others were failing because of an oversight in invalidation. In order to streamline invalidation, I introduced a new action hook bp_group_member_before_delete_invite
. @dcavins - Is the presence of this hook going to mess with the migration to a new Invitations system? (Since presumably invitations won't be a property of "group members" anymore.) If so, I'll roll back this change and just do some parameter juggling - ugly, but not a big deal.
Approach still looks good, and I think we should go with it.
#19
@
9 years ago
Hi @boonebgorges- Thanks for taking another look at this. The patch looks good to me.
The new action hook before an invitation is deleted is totally good--I introduced something similar in my invitations work, and also use it to invalidate a cache.
In 6327.3., I'm not seeing the changes to the file testcases/groups/functions/bpGetUserGroups.php
, so maybe that file didn't get staged before diffing?
Thanks! This'll be a great improvement!
#21
@
9 years ago
- Owner set to boonebgorges
- Resolution set to fixed
- Status changed from new to closed
In 10795:
#22
@
9 years ago
In 6327.3., I'm not seeing the changes to the file testcases/groups/functions/bpGetUserGroups.php, so maybe that file didn't get staged before diffing?
Yup, I forgot :-D
Thanks for your help!
In 9655: