Skip to:
Content

Opened 16 months ago

Closed 3 months ago

#7442 closed enhancement (fixed)

Use JOIN rather than Subquery on user search

Reported by: brandonliles Owned by: DJPaul
Milestone: 3.0 Priority: normal
Severity: normal Version:
Component: Core Keywords: has-patch dev-feedback early
Cc:

Description

The user search relies on two subqueries. This is a known weakness of MySQL performance. Using joins will improve performance.

Attachments (3)

bp-search-use-joins.patch (2.2 KB) - added by brandonliles 16 months ago.
bp-search-use-joins2.patch (2.0 KB) - added by brandonliles 15 months ago.
bp-search-optimize.patch (2.3 KB) - added by brandonliles 14 months ago.
Subquery Materialization Approach

Download all attachments as: .zip

Change History (17)

#1 @r-a-y
16 months ago

Thanks for creating this ticket and for the patch, @brandonliles!

However, the patch is breaking multiple unit tests.

If you are able to, can you set up WordPress for PHP unit testing?
https://make.wordpress.org/core/handbook/testing/automated-testing/phpunit/

Once you have done so, navigate to the buddypress directory and run phpunit in a command prompt. Then, you will be able to see what testcases are breaking.

If you need further help in setting up PHPUnit, let us know.

#2 @brandonliles
15 months ago

Thanks for the feedback @r-a-y just uploaded a second version of the patch that doesn't break stuff :)

#3 @hnla
14 months ago

  • Keywords has-patch dev-feedback added
  • Milestone changed from Awaiting Review to 2.9

@brandonliles
14 months ago

Subquery Materialization Approach

#4 @brandonliles
14 months ago

After seeing the query still perform poorly on a site I took another look, although joins can improve performance, MySQL still tends to look at too many rows in the bp_activity table. Determining the matching user_ids first and plugging those values into the query is a major performance improvement.

New patch bp-search-optimize.patch added.

Last edited 14 months ago by brandonliles (previous) (diff)

This ticket was mentioned in Slack in #buddypress by hnla. View the logs.


14 months ago

#6 @r-a-y
14 months ago

Thanks for the new patch, @brandonliles.

bp-search-optimize.patch passes existing unit tests. The rationale behind the "query for IDs first" approach makes sense to me.

Looks good to me, but needs sign off from a lead dev. We might need some new unit tests to cover this change though.

Version 0, edited 14 months ago by r-a-y (next)

This ticket was mentioned in Slack in #buddypress by hnla. View the logs.


13 months ago

This ticket was mentioned in Slack in #buddypress by r-a-y. View the logs.


12 months ago

#9 @hnla
11 months ago

  • Keywords early added
  • Milestone changed from 2.9 to 3.0

#10 @boonebgorges
9 months ago

@brandonliles Thanks for the ticket and for your patches!

For some background on your original suggestion: We intentionally converted these (and other) user-related queries to subqueries a number of years ago. BuddyPress must remain compatible with configurations where the wp_users table does not live in the same database - or even the same server - as wp_bp_xprofile_data. See eg https://buddypress.trac.wordpress.org/ticket/4060#comment:25

Separating the subquery into a separate $wpdb->get_col() seems like an OK move to me. It'd be even better if we could use this opportunity to wrap a layer of caching around the search. Do you happen to have any benchmarks related to the performance improvements you've mentioned?

I think we have sufficient test coverage for this kind of change. I can't think of a situation where the suggested change would break something that wouldn't be detected by the tests we've already got.

#11 @DJPaul
4 months ago

Before I move this out the 3.0 milestone for inactivity, this path actually looks pretty close, and there's previous agreement from two of the team. It is very unfortunate there are no benchmarks, but if this is meant to be an improvement, we can add it to 3.0 but will be unable to promote any kind of performance number for the specific improvement.

As far as the patch looks, we need to wrap the new query parts in $wpdb->prepare(), I think.

@boonebgorges Is the new query not being cached a blocker? If so, how long should it be cached for, and when should the cache be cleared? It seems to me at a glance that we'd have to clear said cache every time the users table is written to -- which we can't do, when that table is running as a WP global table. Which makes me question the benefit of caching it.

Last edited 4 months ago by DJPaul (previous) (diff)

#12 @boonebgorges
4 months ago

No, caching is not a blocker.

It seems to me at a glance that we'd have to clear said cache every time the users table is written to -- which we can't do, when that table is running as a WP global table.

The general strategy here in WP is to hash the cache key with an incrementor, and then to expire caches by bumping the incrementor, which would happen on events like wp_insert_user() and wp_update_user(). But, while WP already does this for term and comment queries, and for some post queries, it doesn't have a last_changed incrementor for 'users'. See https://core.trac.wordpress.org/ticket/40613. I don't see a strong argument that BP should be handling this, so I agree that caching is not feasible at the moment.

#13 @DJPaul
3 months ago

  • Owner set to DJPaul
  • Status changed from new to assigned

OK, next week I'll drop the patch in (probably without the caching addition, due to time).

#14 @djpaul
3 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 11886:

Members, search: split user matching into seperate query.

Removing the subquery decreases the amount of rows MySQL needs to look at in the bp_activity table.
In the future, if we or WordPress add a user cache incrementor, we could cache this query.

Fixes #7442

Props brandonliles, r-a-y, boonebgorges

Note: See TracTickets for help on using tickets.