Skip to:
Content

Opened 3 years ago

Closed 19 months ago

Last modified 18 months ago

#6327 closed defect (bug) (fixed)

Improved caching for group membership

Reported by: boonebgorges Owned by: boonebgorges
Milestone: 2.6 Priority: normal
Severity: normal Version:
Component: Groups Keywords:
Cc:

Description

There are two broad scenarios where you may want to query for group memberships: (a) when you want to know which members a group has, and (b) when you want to know which groups a member belongs to. The query and caching strategies for the two scenarios are different, and in this ticket I'm talking only about (b).

There are many weird, ad hoc ways that we query for user-group memberships in BP. groups_is_user_member() etc do direct SQL queries. bp_has_groups() with a 'user_id' param does a JOIN against the membership table. groups_get_user_groups() does another type of direct SQL query. groups_total_groups_for_user() does yet another SQL query. And so forth.

With the exception of groups_total_groups_for_user(), none of these queries are cached. I've run into this problem recently when building a plugin that pulls up a list of groups and does membership checks for the groups, sometimes for different users at different times on the page. The obvious answer - groups_is_user_member() - results in a separate SQL query for each call, which can add up to hundreds on a single page load.

After thinking about this for a while, I'd like to recommend the following strategy. Let's do maximal caching of group memberships, on a per-user basis. That is, we'll cache the results of SELECT * FROM $bp->groups->table_name_members WHERE user_id = 12345. Then provide a utility function for getting this information - filtering it based on is_mod, is_confirmed, sorting - all of which will be done in PHP, on the cached data, rather than in a SQL query. Then, start to refactor a bunch of these functions to use the central utility function.

I'm working on patch for this and will post it soon. In the meantime, thoughts are welcome, and I'll use this ticket to commit some preliminary unit tests.

Attachments (5)

6327.patch (33.1 KB) - added by boonebgorges 3 years ago.
6327.2.patch (35.5 KB) - added by boonebgorges 3 years ago.
6327.2.refreshed.patch (35.4 KB) - added by dcavins 19 months ago.
Refreshed version of 6327.2.patch.
6327.3.patch (20.2 KB) - added by boonebgorges 19 months ago.
6327.description.diff (1014 bytes) - added by dcavins 18 months ago.
Adds usage examples to long description.

Download all attachments as: .zip

Change History (29)

#1 @boonebgorges
3 years ago

In 9655:

Allow the creation of group mods in BP_UnitTestCase::add_user_to_group().

See #6327.

#2 @boonebgorges
3 years ago

In 9656:

Unit tests for groups_is_user_*() functions.

See #6327.

#3 @DJPaul
3 years ago

Some of this makes me worry a bit about how the new wrapper function would be used, but that's probably just due to not fully comprehending what you have in mind. I look forward to seeing your patch.

@boonebgorges
3 years ago

#4 @boonebgorges
3 years ago

  • Keywords has-patch added

6327.patch demonstrates what I have in mind. It does the following:

  • Introduces bp_get_user_groups(). This function fetches data about a user's group memberships in a cache-friendly way, and supports some parameters for filtering the return results.
  • Does the necessary cache invalidation. I added a few new hooks to BP_Groups_Member to support this.
  • Refactors the groups_is_user_*() functions to use the new function. I also added a couple of missing functions: groups_is_user_invited(), groups_is_user_banned(). These could be rewritten as a wrapper for groups_is_user_status() or something like that.
  • Refactors BP_Groups_Group::get_group_extras() to use the new function. In this case, we save not only some database overhead, but we also save a bunch of ugly complexity :)
  • Unit tests for the lot.

This all works very well, and cuts down on database queries significantly, especially when extending with certain types of plugins.

A couple of implementation decisions I made in the patch, which others might want to comment on:

  1. Cache granularity. I've chosen to cache an array of group membership objects in 'user_groups_' . $user_id (bucket 'bp_groups'). It would be slightly more complicated, though maybe better in the long run (in terms of reusable cache objects), to cache only the membership IDs in user_groups_$user_id, and then to cache membership objects individually. This is something that could be changed later on, of course.
  2. The syntax of bp_get_user_groups() works like this: the is_* parameters have default boolean values, and you can pass null to any of them to disable the filters (ie, to return results that have either true or false for these fields). We don't really use this syntax anywhere else (we and WP tend to prefer a string like 'any'), but I thought I'd throw it out there, just for laughs.
  3. I've only chosen a couple of places (groups_is_user_*() and get_group_extras()) to use the new system, though there are a couple other places where it could doubtless be used as well.

Feedback welcome.

This ticket was mentioned in Slack in #buddypress by boone. View the logs.


3 years ago

#6 @DJPaul
3 years ago

Looks pretty good. I have not looked at the unit tests (yet).

The new actions in class-bp-groups-member.php ought to be committed separately (as I'm sure you know/were going to do).

I agree caching IDs and the objects individually is probably better, and it feels like it's worth the time investment to get it right:

  • I think it would match what we do in other parts of BP/WP? which seems to work well?
  • Storing "many" objects in one cache item could end up with a very large item in not unrealistic edge cases. Caching backends don't usually handle these well, or keep them cached for long due to the size.

I don't quite understand enough why, for this, we would want to select everything and do the sorting in PHP, and everywhere else, do it across a few SQL queries. Would the "cache IDs and individual objects" approach mean changing this and doing in SQL?

Not too bothered about argument type in bp_get_user_groups but would conservatively prefer something more consistent with our other functions, unless there's an angle I'm not seeing.

#7 @boonebgorges
3 years ago

In 9666:

Introduce 'before'/'after' hooks in BP_Groups_Member delete() and remove() methods.

These hooks parallel the save() hooks, and provide a better mechanism for
cache invalidation.

See #6327.

#8 @boonebgorges
3 years ago

Thank you very much for the feedback, DJPaul.

The new actions in class-bp-groups-member.php ought to be committed separately (as I'm sure you know/were going to do).

Done. r9666.

I agree caching IDs and the objects individually is probably better, and it feels like it's worth the time investment to get it right.

On reflection, I agree. 6327.2.patch makes the necessary changes. I've made further changes to the caching schema to match BP/WP's general caching strategies even better (see the new global cache groups: 'bp_groups_memberships_for_user' (caches arrays of membership IDs on a per-user basis), and 'bp_groups_memberships' (caches membership objects on a per-membership-ID basis).

I don't quite understand enough why, for this, we would want to select everything and do the sorting in PHP, and everywhere else, do it across a few SQL queries. Would the "cache IDs and individual objects" approach mean changing this and doing in SQL?

Doing it in a few SQL queries is much easier than getting the caching right :) Now that 2.patch implements a split cache strategy (cache the IDs, then cache each individual object), we could theoretically choose not to cache IDs at all, but that would mitigate most of the benefit of the caching. So, if we are going to cache the ID query, we have to consider the fact that there are different ways that the items might be requested, based on the parameters. There are two general strategies for this:

  1. Cache the maximal query, and then do sorting/filtering in PHP. That's what I'm doing in this patch.
  2. Generate a cache key based on the parameters, and then cache the IDs for that combination of parameters. This is closer to what WP does in, say, get_terms(): https://core.trac.wordpress.org/browser/tags/4.1.1/src/wp-includes/taxonomy.php#L1672.

(b) is nice because it offloads all of the work to MySQL, which is faster at filtering/sorting than PHP, especially when the number of items to sort/filter becomes very large.

However, (b) pollutes the cache in a pretty severe way. Every possible combination of parameter values will result in a separate cache entry. In the case of bp_get_user_groups(), there are 7 parameters, each of which has at least 2 possible values. So that's more than 27 possible cache entries *for each user_id*. In practice, of course, there won't be nearly this amount of cache pollution, but it's still significant. Perhaps more important is that strategy (b) means that changing a parameter even slightly results in a cache miss, which mitigates a good deal of the benefit of caching in the first place.

Strategy (a), on the other hand, results in a single cache key for each user, and guarantees more cache hits. The downside is reproducing SQL-type logic for filtering/sorting in PHP, but given the fairly small number of paramaters and the fairly small data sets we're working with here (people are generally not members of more than a few dozen groups), I think it's a good tradeoff.

@boonebgorges
3 years ago

#9 @DJPaul
3 years ago

Thanks for explaining the reasoning behind how you are caching things.

#10 @boonebgorges
3 years ago

  • Milestone changed from 2.3 to 2.4

dcavins and I have had a number of discussions about this. Given that invitations are going to be refactored and differently stored in 2.4, it's not going to make sense for bp_get_user_groups() to fetch invitations and pending memberships - it will be kind of a backward compatibility nightmare. So let's hold off on this until the other refactoring is done.

#11 @DJPaul
2 years ago

This doesn't look touched, so 2.5.

#12 @DJPaul
2 years ago

  • Milestone changed from 2.4 to 2.5

#13 @r-a-y
22 months ago

  • Milestone changed from 2.5 to 2.6

This ticket was mentioned in Slack in #buddypress by boone. View the logs.


21 months ago

#15 @DJPaul
20 months ago

  • Keywords has-patch removed
  • Milestone changed from 2.6 to Future Release

#16 @dcavins
19 months ago

@boonebgorges This is excellent; the gains on group directories and activity streams are impressive (on my test site, I trimmed a page load from 655Q to 155Q).

I think this should go in sooner rather than later if you're still comfortable with it, because it won't make the invitations work any harder. The new invitations info + BP_Groups_Member interaction is where I've gotten into sticky situations (https://buddypress.trac.wordpress.org/ticket/6210), and the changes proposed in this ticket would be easier to work with/no increase in difficulty over the existing code. I think, anyway.

I've refreshed 6327.2.patch and will attach it.

@dcavins
19 months ago

Refreshed version of 6327.2.patch.

This ticket was mentioned in Slack in #buddypress by mercime. View the logs.


19 months ago

#18 @boonebgorges
19 months ago

  • Milestone changed from Future Release to 2.6

6327.3.patch fixes the patch so that the tests patch. There was an error in one of the tests (_invite_delete), and some others were failing because of an oversight in invalidation. In order to streamline invalidation, I introduced a new action hook bp_group_member_before_delete_invite. @dcavins - Is the presence of this hook going to mess with the migration to a new Invitations system? (Since presumably invitations won't be a property of "group members" anymore.) If so, I'll roll back this change and just do some parameter juggling - ugly, but not a big deal.

Approach still looks good, and I think we should go with it.

#19 @dcavins
19 months ago

Hi @boonebgorges- Thanks for taking another look at this. The patch looks good to me.

The new action hook before an invitation is deleted is totally good--I introduced something similar in my invitations work, and also use it to invalidate a cache.

In 6327.3., I'm not seeing the changes to the file testcases/groups/functions/bpGetUserGroups.php, so maybe that file didn't get staged before diffing?

Thanks! This'll be a great improvement!

#20 @boonebgorges
19 months ago

In 10794:

Introduce caching for group memberships.

The new system works like this: The bp_groups_memberships_for_user cache
group stores arrays of membership IDs for individual users. The
bp_groups_memberships cache group stores data about individual memberships.
The new function bp_get_user_groups() populates a user's group memberships
from these caches, and filters them as requested in the function parameters.
Then, the various groups_is_user_*() functions use bp_get_user_groups()
instead of direct, uncached database queries to fetch their data.

In addition, the get_group_extras() method of BP_Groups_Group can now be
greatly simplified, since all necessary pre-fetching of current-user group
memberships happens via the bp_get_user_groups() cache.

Props boonebgorges, dcavins.
See #6327.

#21 @boonebgorges
19 months ago

  • Owner set to boonebgorges
  • Resolution set to fixed
  • Status changed from new to closed

In 10795:

Correct @return documentation for groups_is_user_*() functions.

They have always said bool. That has always been false.
Fixes #6327.

#22 @boonebgorges
19 months ago

In 6327.3., I'm not seeing the changes to the file testcases/groups/functions/bpGetUserGroups.php, so maybe that file didn't get staged before diffing?

Yup, I forgot :-D

Thanks for your help!

This ticket was mentioned in Slack in #buddypress by dcavins. View the logs.


19 months ago

@dcavins
18 months ago

Adds usage examples to long description.

#24 @dcavins
18 months ago

In 10885:

Groups: Add usage examples to bp_get_user_groups().

Add common usage examples to the long description of the new function
bp_get_user_groups().

See #6327.

Note: See TracTickets for help on using tickets.