Skip to:
Content

BuddyPress.org

Opened 13 years ago

Closed 12 years ago

#4060 closed enhancement (fixed)

Improve performance of BP_Core_User::get_users()

Reported by: shanebp's profile shanebp Owned by: boonebgorges's profile boonebgorges
Milestone: 1.7 Priority: highest
Severity: critical Version: 1.2.9
Component: Members Keywords: needs-testing reporter-feedback
Cc: matt_buddypress-trac@…, georgemamadashvili@…

Description

Apologies if this shouldn't be a ticket...

We have ~25k users.

We’re still on BP Version 1.2.9 – (working on upgrading to 1.3)

I’m wondering if the issue below is addressed by 1.3 and/or if there is something we can do on 1.2.9

Here is an example of a slow query:
SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u LEFT JOIN wp_usermeta um ON um.user_id = u.ID WHERE u.user_status = 0 AND um.meta_key = 'last_activity' ORDER BY um.meta_value

I think this is coming from bp-core-classes -> function get_users

We’ve increased :
query_cache_size to 128M
query_cache_limit: to 2M
join_buffer_size to 2M

But we still have problems during high traffic.

Any suggestions on optimizing that query or some other approach ?

Attachments (10)

slow_snip.txt (9.0 KB) - added by shanebp 13 years ago.
last_activity.txt (28.9 KB) - added by shanebp 13 years ago.
PerconaQueryOptimizations.pdf (113.7 KB) - added by TheBeardedOne 12 years ago.
Query Optimizations that was done by Percona via consulting contract
4060.01.patch (15.9 KB) - added by boonebgorges 12 years ago.
4060.02.patch (19.8 KB) - added by johnjamesjacoby 12 years ago.
4060.03.patch (25.4 KB) - added by boonebgorges 12 years ago.
4060-fix-search.001.patch (920 bytes) - added by DJPaul 12 years ago.
4060-fix-online.001.patch (1.1 KB) - added by DJPaul 12 years ago.
4060-fix-friends.patch (608 bytes) - added by DJPaul 12 years ago.
4060-fix-friends.002.patch (746 bytes) - added by DJPaul 12 years ago.

Download all attachments as: .zip

Change History (63)

#1 @DJPaul
13 years ago

  • Component changed from Core to Members
  • Milestone changed from Awaiting Review to 1.6

Moving to 1.6 for review along with #4041, #4045.

Last edited 13 years ago by DJPaul (previous) (diff)

@shanebp
13 years ago

#2 @shanebp
13 years ago

Thx DJP.

Attached is a slow-log snip showing the mentioned query and some of the queries that stack up behind it, based on timestamp.

#3 @shanebp
13 years ago

I'm no expert on queries, but perhaps this would help.

The SELECT DISTINCT u.ID... does a join on wp_usermeta looking for 'last_activity'.
For our site, wp_usermeta is over 1M rows.

Due to the structure of wp_usermeta, 'last_activity' cannot be indexed, correct ?

Then maybe 'last_activity' should be in a separate table for BP so that it can be indexed. Especially because you do an Order By on 'last_activity'.

Or harnessing memcache directly ?

[btw - in the initial entry I refer to BP 1.3, I meant BP 1.5]

Last edited 13 years ago by shanebp (previous) (diff)

#4 @shanebp
13 years ago

The more I look at this, the more I'm convinced that BP will _never_ scale while member last_activity is stored in wp_usermeta.

Moving last_activity into wp_users makes the most sense but would go against WP and make updating a mess. [ Any chance of convincing WP to make that change? lol ]

So I'm going to try changing update_user_meta to write to a separate table and point Select statements to that table. Still a bunch of work.

Any suggestions about this would be most welcome.

Groups->last_activity is not an issue even though we have ~600 groups.

Last edited 13 years ago by shanebp (previous) (diff)

#5 @boonebgorges
13 years ago

shanebp - Thanks for doing research. Would love to hear more.

Keep in mind that we already have an activity transaction table: wp_bp_activity. I know that there are already some indexing/speed issues with that table (see eg #4045), but if we could solve some of them, it would be nice to use a table that we already have. In this case, it would only add as many rows to the table as there are members, as you only need to store the *last* activity for a given user. (I imagine this would be done with a new activity type - user_last_activity, group_last_activity, etc.) Then, instead of joining against wp_bp_activity in BP_Core_User, we could consider doing a subquery.

In any case, it would be nice to do some benchmarks with, say, a million test users (current setup vs standalone table vs additional activity type in wp_bp_activity).

#6 @shanebp
13 years ago

boonegorges - seems like we're having a bit of an apples/oranges discussion.

For us, the main issue isn't joining on wp_bp_activity - it's the left join on wp_usermeta in BP_Core_User -> get_users();
For example, look at the Rows_examined in this snip from todays slow log:

# Query_time: 8.225675  Lock_time: 0.000083 Rows_sent: 28  Rows_examined: 74233
SET timestamp=1331480109;
SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u LEFT JOIN wp_usermeta um ON um.user_id = u.ID WHERE u.user_status = 0 AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC LIMIT 0, 28;

And other simple queries start stacking up behind it.

Adding only as many rows as there are members is exactly the problem with using wp_usermeta to store 'last_activity' time-stamp.

A table only holding only last_activity [id, user_id, last_activity]
I think would be faster, very indexable and last_activity could be a date-time column instead of longtext - as used in wp_usermeta. And it would only have the same number of rows as wp_users.

I'm not sure I have the skills to do the benchmarking you mentioned - but it would be great to see it.

I cringe at the thought of adding a last_activity table. But we don't have much choice.
We switched from MyISAM to InnoDB, doubled the size of our db server, optimized our searches, etc. The result is that our page loads have stayed steady - but that's all, so it's not enough.

We have ~25k very active users; the average online number is 400, peaking ~1000.

Anyhow, thanks for your input.

Last edited 13 years ago by shanebp (previous) (diff)

#7 @boonebgorges
13 years ago

seems like we're having a bit of an apples/oranges discussion.

I don't think so. I'm just saying that we may be able to manage without creating a new table. The problem with the current last_activity implementation is not the join itself - it's the fact that (as you note) we ORDER BY meta_value, and the meta_value column is not indexed. The case is different with wp_bp_activity - we have a date_recorded column that *is* indexed. So putting the last_activity in that table would, in theory, be much faster than the current implementation in wp_usermeta.

The only reason I bring up the issue of #4045 is because my logic here depends on the assumption that your wp_bp_activity table is already in good shape. Obviously, moving from the wp_usermeta implementation to the wp_bp_activity implementation wouldn't do much good if your activity table queries are slow too.

This should be a fairly straightforward thing to test - we wouldn't really need to make any changes in BP for the benchmarking, just create the tables and try some different queries. I'm working on some related issues at the moment, so I will try to find some time to do it myself.

In the meantime, if you find that the wp_usermeta thing really is a bottleneck, then you could create your own last_activity table, and filter the BP_Core_User queries to join against your own table instead of wp_usermeta. Obviously this is a bit more work than fixing it in BP itself, but it may take some time to get it sorted out in BP.

#8 @shanebp
13 years ago

Ah - thanks for the explanation. Much appreciated.

I get the logic of your idea re wp_bp_activity and will look into it.

And if that idea leads to me submitting a ticket like #4045 in a few months... argh.

Last edited 13 years ago by shanebp (previous) (diff)

#9 @boonebgorges
13 years ago

  • Keywords 1.7-early 2nd-opinion added
  • Milestone changed from 1.6 to Future Release
  • Severity changed from normal to critical
  • Summary changed from slow queries in bp-core-classes to Improve performance of BP_Core_User::get_users()
  • Type changed from defect (bug) to enhancement

Hi everyone. I've had a chance to do a bit of experimentation with my wp_bp_activity suggestion. It doesn't work out of the box, but if I do a bit of query restructuring, it works really beautifully. Here are some sample stats, using a database of about 1,000,000 activity items and 235,000 users.

*CURRENT BP TRUNK*

mysql> SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, 
u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u LEFT JOIN 
wp_usermeta um ON um.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 
AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC LIMIT 0, 20;
------------+------------------------------------------------+---------------------+
20 rows in set (11.31 sec)
mysql> SELECT COUNT(DISTINCT u.ID) FROM wp_users u LEFT JOIN wp_usermeta um ON um.user_id =
 u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND um.meta_key = 
'last_activity' ORDER BY um.meta_value DESC;
+----------------------+
| COUNT(DISTINCT u.ID) |
+----------------------+
|               237511 |
+----------------------+
1 row in set (4.00 sec)

*MOVED LAST_ACTIVITY TO WP_BP_ACTIVITY, WITH SIMILAR QUERY STRUCTURE*

I wrote a script that rewrote last_activity usermeta to the activity table, with type = 'last_activity' and user_id = $user_id. Then I changed the JOIN so it was against activity instead of usermeta, but left the basic query structure the same.

mysql> SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, 
u.display_name, u.user_email , a.date_recorded as last_activity FROM wp_users u LEFT JOIN 
wp_bp_activity a ON a.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0
 AND a.type = 'last_activity' ORDER BY a.date_recorded DESC LIMIT 0, 20;
------------+------------------------------------------------+---------------------+
20 rows in set (12.22 sec)
mysql> SELECT COUNT(DISTINCT u.ID) FROM wp_users u LEFT JOIN wp_bp_activity a ON a.user_id =
 u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND a.type = 'last_activity'
 ORDER BY a.date_recorded DESC;
+----------------------+
| COUNT(DISTINCT u.ID) |
+----------------------+
|               237511 |
+----------------------+
1 row in set (4.07 sec)

As you can see, there is not any benefit from this move in itself, because the way in which the items are joined prevents MySQL from taking advantage of the date_recorded index.

*MOVED LAST_ACTIVITY TO WP_BP_ACTIVITY, AND CHANGED QUERY TO A SUBQUERY*

In order to take advantage of the date_recorded index, I tested with a subquery, with extremely promising results. Because the data is structured differently, is requires three queries instead of two (one to get the user IDs, one to get the user IDs, another to get the total count, and a third (this is the new one) to get the missing data out of wp_users that used to come from the join.

mysql> select distinct user_id from wp_bp_activity where user_id not in (select ID from 
wp_users where spam != 0 OR deleted != 0 OR user_status != 0) and type = 'last_activity' 
order by date_recorded desc limit 20;
+---------+
20 rows in set (0.01 sec)
mysql> select count(distinct user_id) from wp_bp_activity where user_id not in 
(select ID from wp_users where spam != 0 OR deleted != 0 OR user_status != 0) 
and type = 'last_activity';
+-------------------------+
| count(distinct user_id) |
+-------------------------+
|                  237511 |
+-------------------------+
1 row in set (4.15 sec)
mysql> SELECT ID, user_registered, user_nicename, user_login, display_name, 
user_email FROM wp_users WHERE ID IN (1,156563,31438,18959,115038,95468,
81552,217731,180316,50888,129014,214293,80522,16344,31461,9046 ,1035,
51964,176472,127686);
------------+------------------------------------------------+
20 rows in set (0.00 sec)

Obviously, this is a dramatic improvement. Just for fun, I decided to try a similar subquery structure using the current last_activity data in wp_usermeta:

mysql> select distinct user_id from wp_usermeta where user_id not in 
(select ID from wp_users where spam != 0 OR deleted != 0 OR user_status 
!= 0) and meta_key = 'last_activity' order by meta_value desc limit 20;
+---------+
20 rows in set (6.09 sec)

6.09 seconds is much faster than the 11-12 seconds for a join, but we are still forced to use filesort to do the DESC sort due to the lack of an index on wp_usermeta.meta_value, which is the real source of pain.

====

TAKEAWAYS:

Moving to subqueries for member queries has the potential for huge benefits, so I think we should pursue it. And I think that moving last_activity to the activity table is hugely beneficial as well, so we should also pursue that. (For group last_activity too, though usually groups are not such a pressure point since there are generally fewer of them.) But there are a couple of caveats that will make this a non-trivial task:

  • The query methods will need to be rewritten for the three-query structure mentioned above, rather than the current two-query structure. This is straightfoward.
  • We are currently using the very same query structure for all different "types" of user queries: last active, alphabetical, newest registered, online. 'last active' is the default, and 'online' is really a subset of 'last active', which is why I've focused on it above. Though this needs testing, I have a feeling that 'alphabetical' and 'newest registered' will need to be structured quite differently. 'newest registered' is pretty easy - wp_users has an index on user_registered, so we query that and then use a subquery or a second query to get last active data. There's a good chance that we're never going to be able to optimize 'alphabetical' without some major modifications, because we store that data in wp_users.display_name and wp_bp_xprofile_data.data, neither of which are indexed. Regardless, it will not work well with the wp_bp_activity subquery method.
  • Backward compatibility. Do we keep 'last_activity' data mirrored in wp_usermeta? Generally mirroring data is bad, but (1) it's nice to leave it for plugins that depend on it (though it would be problematic for plugins that *modify* it, and (2) moving to the activity table will add a small amount of overhead for the loggedin/displayed user, since usermeta is loaded in the get_userdata() call, while the activity data will take a separate query. On balance, I think it's best to just move it, and to urge plugin authors to use the existing bp_core_record_activity() and bp_core_get_last_activity() functions, which need to be refactored of course.
  • The COUNT queries are equally bad in all the scenarios above. There's no good way around this. So we should probably stop doing a live query for total member counts, store it as an option, and then bust and recalculate when new members join/become active or when accounts are deleted.
  • Backward compatibility: We have filters on the sql queries, which will more or less break if the queries are radically restructured. I think it makes sense to remove the filters and add new ones; see https://core.trac.wordpress.org/ticket/18536#comment:14 and surrounding discussion for WP reasoning on a similar issue
  • Moving last_activity to the activity tables is problematic because the Activity component is optional. If it's disabled, we don't have the tables, and we don't have the functions that write to the tables. So if we go this way, we either have to make Activity required, or maintain two different sets of query logic - one that uses Activity, and one that doesn't (which would, I guess, be the crappy version we have now). I'd rather go with making Activity required, but that will take a bunch of other kinds of work. An alternative approach is to store last_activity in a standalone table structured just like wp_bp_activity when Activity is disabled, which would still enable Activity to be turned off. The problem there is what would happen if someone decided to switch Activity on and off. Or we could just use a totally separate table, as originally suggested by shanebp, but that seems like a waste to me - we already have an activity table structure in place.

====

I'm moving this ticket to 1.7-early, because IMO it is going to require a large amount of work and especially user/plugin testing.

Last edited 13 years ago by boonebgorges (previous) (diff)

#10 @shanebp
13 years ago

Boone - thanks for all the work and info.

Do you think it's feasible to try and implement your query restructuring on a 1.2.9 install?

Last edited 13 years ago by shanebp (previous) (diff)

#11 @boonebgorges
13 years ago

Do you think it's feasible to try and implement your query restructuring on a 1.2.9 install?

Theoretically yes, but the problem is that the restructuring is not actually built yet :) I messed around a bit in the query class but I didn't actually write the logic to build the subqueries. I wrote them manually to do the benchmarks you see above. Writing that logic is going to take some time. Some changes will also need to be built elsewhere into BP, specifically wherever user activity is recorded and fetched. If you have some time to start a patch, it would be most welcome. Otherwise it will have to wait until I get some serious time to sit down with it.

On the bright side, these queries have not significantly changed between 1.2.9 and 1.5.x, so there shouldn't be a problem manually applying the patch. (However, your best bet is to get the heck off of 1.2.9! ;) )

#12 @shanebp
13 years ago

Boone - so your main reason for not moving last_activity to a new table is due to your aversion to creating a new table ? Granted wp_bp_activity already has a date_recorded column that *is* indexed, but wp_bp_activity can also get very big.

btw - I couldn't figure out how to implement your ideas above, so I decided to try the new table for last_activity approach. And got a hack working for last_activity, but it breaks too many other things, like search, due to the need for restructuring query logic - a task beyond me.

I'd be happy to track down wherever user activity is recorded and fetched elsewhere in BP.
I could deliver a list rather than a patch.

Last edited 13 years ago by shanebp (previous) (diff)

#13 @boonebgorges
13 years ago

Boone - so your main reason for not moving last_activity to a new table is due to your aversion to creating a new table ? Granted wp_bp_activity already has a date_recorded column that *is* indexed, but wp_bp_activity can also get very big.

More or less. All things being equal, we should avoid creating new tables. That's especially true in this case, since we literally already have a table devoted to recording *activity* items. It's true that size is a concern, but keep in mind that only a single last_activity item would be recorded at any time for a given member. So in a community with, say, 10,000 users, an "average" size activity table may have 500,000 rows (50 per user), but my proposal would only add 10,000 rows. That said, there is no question that activity query performance must be improved before my proposal can be seriously considered; see #4045.

I'd be happy to track down wherever user activity is recorded and fetched elsewhere in BP.

I could deliver a list rather than a patch.

Any little bit is helpful. Thanks :)

#14 @shanebp
13 years ago

Search result for 'last_activity' in BP 1.5.4 attached.

Please let me know if you want more info, different format, etc.

#15 @boonebgorges
13 years ago

Thanks, shanebp. This is helpful.

#16 @shanebp
12 years ago

Just a note - we recently added a tempfs mount (512MB)to our mysql server.
Maybe this is standard.
Anyhow, while it didn't 'solve' our issues with queries in question, it has definitely mitigated those issues.

Last edited 12 years ago by shanebp (previous) (diff)

#17 @shanebp
12 years ago

Boone -

The mysql guys at RackSpace looked at some of our queries - one of which started this ticket.
I thought I'd share their findings here so you and anyone else could comment on their suggestions.

Report:
I see two queries that could cause concurrency bottlenecks; both of these are showing up in the processlist many times per minute:

Query 1:

 SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name
  FROM wp_bp_activity a USE INDEX (date_recorded) LEFT JOIN wp_users u 
ON a.user_id = u.ID WHERE a.user_id IN ( 28221 ) AND a.component 
IN ( 'album' ) AND a.item_id IN ( 47715 ) AND a.type != 'activity_comment'
 ORDER BY a.date_recorded DESC LIMIT 0, 20

This query performs a 0.7 second filesort and I see this query run many times. Removing the USE INDEX (date_recorded) portion of this query brought the total run time down to around 1 millisecond. I recommend removing this portion of the query from your codebase.

Query 2:

SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login,
 u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u 
LEFT JOIN wp_usermeta um ON um.user_id = u.ID WHERE u.user_status = 0 
AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC LIMIT 0, 28


This query takes approximately 0.6 seconds to run -- 0.5 seconds to create a temporary table for the join and 0.1s to perform the ORDER BY um.meta_value DESC operation. Rewriting this query to limit the size of the order by improved the running time to approximately 0.17 seconds. I recommend using the following query in place of the above:

SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, 
u.display_name, u.user_email , um.meta_value as last_activity
    FROM wp_users u
    INNER JOIN (SELECT user_id, meta_value FROM wp_usermeta WHERE meta_key = 
'last_activity' ORDER BY meta_value DESC LIMIT 0, 28) AS um
        ON u.ID = um.user_id;
Last edited 12 years ago by shanebp (previous) (diff)

#18 follow-up: @boonebgorges
12 years ago

shanebp - This is extraordinarily helpful. Thanks so much for passing it along.

The first query is related to activity index hints, covered in #4045. I have copied that part of your comment there, along with a question for you.

The second tip will probably become irrelevant when we do some of the planned refactoring. We'll no longer be joining wp_users against wp_usermeta in order to get the information; we'll be using properly indexed tables instead. No join + indexes means that there will be no need for temp tables.

Thanks again for your persistence on this. Your help has been crucial in tracking down the problematic queries.

#19 in reply to: ↑ 18 @shanebp
12 years ago

boone -

The second tip will probably become irrelevant...

But in the meantime - any reason not to use it ?

#20 @boonebgorges
12 years ago

But in the meantime - any reason not to use it ?

It's not wise at this point in the dev cycle to change this query structure, because it would break plugins that attempt to filter the query directly. You could filter the query yourself to accomplish what's being suggested above, of course.

#21 @TheBeardedOne
12 years ago

  • Cc matt_buddypress-trac@… added

@TheBeardedOne
12 years ago

Query Optimizations that was done by Percona via consulting contract

#22 @TheBeardedOne
12 years ago

I've attached a PDF of query optimizations that were done by Percona.

#23 @boonebgorges
12 years ago

  • Keywords 1.7-early removed
  • Milestone changed from Future Release to 1.7
  • Priority changed from normal to highest

TheBeardedOne - Thanks so much for sharing this document with us. It's extremely helpful.

This ticket is my personal #1 priority for this dev cycle. I have the beginnings of a patch in my local repo, which I'll share in the upcoming week or two.

#24 @boonebgorges
12 years ago

I've just closed #3127 in favor of this ticket. See that ticket for some interesting discussion of WP_User_Query, which will play a role in the patch I have to share with all of you soon.

#25 @boonebgorges
12 years ago

  • Keywords has-patch needs-testing dev-feedback added

4060.01.patch is a first rough draft of overhauling our user queries.

First, a brief summary of the issues at hand. Our current user queries run through BP_Core_User::get_users(). The queries built by this method do not scale very well at all, due to a couple of fundamental issues:

  • Directional joins against global tables. The query constructor always joins (at least) wp_users against wp_usermeta, and almost always joins against some BP tables too
  • Directional joins against improperly indexed tables. wp_usermeta does not have an index on meta_value (which is used for last_activity - every single query - as well as a number of other query types)
  • SELECT * and other greedy queries. With very large datasets, queries like SELECT * and other multiple-column SELECT statements increase the likelihood that temp tables will be required.
  • Unflexible COUNT(*) queries. At a certain point, there's nothing you can really do to optimize these kinds of queries. Our current user query setup prevents admins from easily overriding them (with, eg, SQL_CALC_FOUND_ROWS and FOUND_ROWS()) or caching them or skipping them altogether

My proposed solution 4060.01 introduces BP_User_Query and deprecates BP_Core_User::get_users(). This solution eliminates some of the issues above, mitigates some of those that it can't eliminate outright, and has a couple of added benefits. I'll outline the general strategy and its benefits, and then pose some questions for feedback.

BP_User_Query is intentionally modeled on WP_User_Query, with similar method and property names, and a similar structure. This should make it easier for WP devs to approach. It improves our user queries in a couple of key ways. First and foremost, it splits our big multiple JOIN into several steps:
1) A custom query for getting matching IDs only. This parallels the split_the_query option recently introduced in WP_Query, for similar reasons: when your initial WHERE query (which does all the heavy lifting of filtering) only loads IDs, you are loading far less information into memory than if you do SELECT(*) or something like that, with the result that you have to resort far less to temporary tables.
2) Once we have the resulting user IDs, we use WP_User_Query to get some of the WP core data that BP needs (user_email, etc)
3) Finally, we use the user IDs to run another set of queries, to fetch the BP-specific data corresponding to 'populate_extras': friend counts, is_friend, last_activity, etc. Much of this data comes from wp_usermeta, and was previously fetched in separate standalone queries; I was able to refactor it to roughly halve the number of queries required for populate_extras.

Some sample preliminary results, on a dataset with ~113K rows in wp_users and ~1.35M rows in wp_usermeta:

TYPE: 'newest'

BP_Core_User::get_users

  • Main query: 3,050ms
  • Count query: 2,372ms

BP_User_Query

  • Main query: 1,577ms
  • Count query: 185ms

TYPE: 'alphabetical' (xprofile disabled or BP-WP profile sync enabled)

BP_Core_User::get_users

  • Main query: 3,464ms
  • Count query: 2,491ms

BP_User_Query

  • Main query: 88ms
  • Count query: 68ms

TYPE: 'alphabetical' (xprofile enabled and BP-WP profile sync disabled)

BP_Core_User::get_users

  • Main query: 3,464ms
  • Count query: 2,491ms

BP_User_Query

  • Main query: 509ms
  • Count query: 151ms

As you add more filters to the query (like search_terms), they get slower, of course. But the speed improvements are roughly linear to what you see above. As you can see, avoiding unnecessary JOINs results in huge improvements. Moreover, the restructuring means that BP tables are never joined against WP tables during user queries (except in one or two filter cases that I haven't gotten around to rewriting yet - see meta_key/meta_value).

A couple of questions and directions for further development:

  • 'last_activity' - The current bottleneck of the whole setup is the 'last_activity' usermeta value. We use this to order and filter many of our user queries. But storing this data in wp_usermeta means that there's no index, so that our queries are quite slow (see my 'newest' numbers above). Moving this to a properly indexed transactional table would speed things up enormously. wp_bp_activity is the obvious choice, since it's built for this purpose. But such a move would mean that the BP Activity component could not be shut off (or would have to be split into two parts, so that the core couldn't be shut off), and it would also cause backward compatibility issues. For BP 1.7, I propose that we do *not* pursue any change along these lines - one thing at a time - but that we revisit for a future release.
  • Backward compatibility - I have chosen to leave BP_Core_User::get_users() untouched, and to introduce BP_User_Query as a new class. That means that anyone continuing to use BP_Core_User::get_users() directly will see a deprecated notice, but their code will still work. The point at which I'm choosing to swap out the new class is in functions like bp_core_get_users(). (There are 6-8 other places throughout BP where a similar change will have to be made; I haven't done it yet, because I want to settle on a strategy first.) For the vast majority of uses, this changeover will be totally seamless. However, anyone who is directly filtering the get_users() sql queries ('bp_core_get_paged_users_sql', 'bp_core_get_total_users_sql') will be affected by the move to BP_User_Query, because these filters disappear. (There's simply no way to salvage them, given that the entire query structure is changed.) My proposed backpat strategy, as you can see in the patch, is to provide a 'bp_use_legacy_user_queries' filter, which takes the FUNCTION as a parameter, so that developers can opt to fall back on the old method in selected cases. What do you think of this strategy?
  • Ongoing status of BP_Core_User - If we're no longer using BP_Core_User::get_users(), the only thing that BP_Core_User is really used for is instantiating a *single* user object. This, IMO, is a good thing - it more closely parallels the relationship between WP_User and WP_User_Query. However, as it stands, there is currently (both in the current codebase and my own patch) some reduplication of efforts. For example, it doesn't make sense for there to be two different ways for 'populate_extras' to be run, one for multiple users and one for single users. This makes for multiple points of failure. In the future it would be nice to abstract this somewhat. Perhaps BP_Core_User could be refactored somewhat so that it can be instantiated either for single user IDs (the current setup) or for multiple ones; then, instead of doing populate_extras and WP_User_Query business inside of BP_User_Query, the latter class would instead use BP_Core_User to call up the additional data required, once it'd come up with a list of user ids. This is similar to how WP_User_Query handles all_with_meta queries - it uses WP_User internally. This is probably not required for 1.7, but is a good direction for future parity with WP.

Whew. Sorry for all the words - I've thought a lot about this, and want to make sure it's done right, as it's a major step toward scaling BP. Feedback welcome.

#26 @cnorris23
12 years ago

I'm a fan of the plan and the patch. I'm hoping to find some more time this dev cycle (and on) to star helping out more.

#27 @Mamaduka
12 years ago

  • Cc georgemamadashvili@… added

#28 @johnjamesjacoby
12 years ago

Huge wins overall. Outstanding work.

Attached patch v2:

  • Cleans up whitespace, comments, indentation, etc... (as if you expected anything else)
  • Removes test cases for commented out bugs (I tend to agree with your assessment here)
  • Decouples bp_is_active() checks from populate_extras() method, and moves them into the xprofile and friends components
  • Adds some @todo's for other places bp_is_active() should be decoupled.
  • Testing this on a small dataset, the queries are faster but it results in more of them (probably because of duplicate xprofile primary name field ID querying).
  • I'm okay with the 'last_activity' behavior changing; it's something that comes up enough as unexpected where even though it's 'working as designed' it's still a poor design.

Overall, this is the direction we should go. I'm comfortable committing this in early, and iterating on it over time. Getting this in early allows for testing this with other user related queries to look for optimizations; friends, group members, activity streams, etc...

#29 @DJPaul
12 years ago

Looks good. Great work, Boone.

#30 @boonebgorges
12 years ago

Thanks for the cleanup, johnjamesjacoby.

I agree that getting rid of the bp_is_active() checks would be nice to get rid of. Most of them could probably be dropped filtering the query_vars before assembling the SQL - this will allow components to modify the 'include' and 'exclude' parameters using whatever queries they wish. Maybe we can look more closely at this once the base patch is committed - it's an optional enhancement for now, more for architectural elegance than any functional benefit.

the queries are faster but it results in more of them (probably because of duplicate xprofile primary name field ID querying).

Yes, and the fact that many JOINs were replaced with single table queries. This is case where more queries are *better*, because they're so much faster, and will be so much more cacheable by SQL cache.

I'm okay with the 'last_activity' behavior changing; it's something that comes up enough as unexpected where even though it's 'working as designed' it's still a poor design.

This is going to raise enormous problems for backward compatibility. I'll open a separate enhancement ticket where we can talk about it. I don't want that issue to hold up these improvements.

===

4060.03 fixes a couple of things:

  • Pulls 'meta_key'/'meta_value' queries into a separate query, to avoid all joins against wp_usermeta and wp_users (woo hoo!)
  • Implements the previously missing 'random' and 'popular' sort types
  • Switches the queries fired by the Members widget to use populate_extras, so that they get all the data they need. This change is required by some of the changes in the way the queries are built (we're no longer scooping up all metadata on the first query)

IMO this is getting pretty close to being ready to rock.

#31 @DennisSmolek
12 years ago

Boone,

I'm running a fairly large install(26k in wp_users, 1.9M in user_meta, 686k in bp_xprofile)

What kind of info would you like me to test against. Honestly I've never done much MySQL testing so let me know what would help.

We query users quite often by the admin staff. Our users cant query each other. I was using only wp_usermeta but the default search couldnt find them so I duplicate that data in bp_xprofile for the search tool.

-D

#32 @boonebgorges
12 years ago

DennisSmolek - Thanks for jumping in here. The best way to test would be to install trunk + this patch in a test environment (with a mirror of your live site's data) and do some simply mysql profiling. Maybe the easiest way to do this is to install the Debug Bar plugin, and then put define('SAVEQUERIES',true); in your wp-config file. Then, on each pageload, you'll be able to see which queries were run, and how long they took. Use add_filter( 'bp_use_legacy_user_query', '__return_true' ); as a toggle to turn on/off the new query class, so you can get side-by-side results.

I wouldn't recommend putting this on your live site at the moment, as it's not been fully tested.

#33 @boonebgorges
12 years ago

(In [6314]) Introduces BP_User_Query for improved member query performance

  • Introduces BP_User_Query class
  • Deprecates use of BP_Core_User::get_users() in bp_core_get_users(), the main member query function in BuddyPress
  • Introduces bp_use_legacy_user_query filter, to allow plugins to use legacy query method during transition
  • Expands user query functionality to support 'user_ids' parameter, which can be used to skip the user_id query altogether, for maximum flexibility in plugins and themes

Props boonebgorges, johnjamesjacoby.

See #4060

#34 follow-up: @DJPaul
12 years ago

I was about to patch #4347, but realised it was in the deprecated get_users() function. I went looking for its new home, but the "online" query isn't built the same. Boone, was this intentional for performance reasons or is it an oversight?

#35 in reply to: ↑ 34 @boonebgorges
12 years ago

Replying to DJPaul:

I was about to patch #4347, but realised it was in the deprecated get_users() function. I went looking for its new home, but the "online" query isn't built the same. Boone, was this intentional for performance reasons or is it an oversight?

Looks like it was an oversight. If you'd like to add it, please do, otherwise I will when I get a few minutes.

#36 @DJPaul
12 years ago

Member search is broken with the new BP_User_Query. 4060-fix-search.001.patch fixes part of the problem (invalid function argument).

The remaining problem is tougher as if it can't find any users matching the search details, it carries on with the user query as though no search term was passed. What should happen is that if there are no users matching the search, we should bail out and return zero matches. This is consistent with pre-BP_User_Query behaviour.

#37 @boonebgorges
12 years ago

  • Keywords 2nd-opinion has-patch dev-feedback removed

Thanks for the patch, DJPaul. You're correct about the case where no users are found via search. I'll commit a fix in a moment.

#38 @boonebgorges
12 years ago

(In [6470]) Modifications to BP_User_Query to make member searches work correctly

  • Don't pass invalid function parameter when doing user_id search. Props DJPaul
  • Introduce $no_results property (borrowed from WP_Tax_Query)
  • When a member search returns no results, ensure that the user query also returns no results

See #4060

#39 @DJPaul
12 years ago

  • Keywords has-patch added

4060-fix-online.001.patch tries to fix the "online" query. It works in my testing but as I'm not familiar with BP_User_Query, putting the patch up here for review.

#40 @boonebgorges
12 years ago

DJPaul - This looks good to me.

I know you were probably just porting over from the old version, but we might consider doing the DATE_ADD operation on the UTC_TIMESTAMP() instead of the meta_value, since that way we only have to do it once instead of on every returned row. Or maybe this is better handled in a different ticket. Anyway, you should go ahead and commit your change.

#41 @DJPaul
12 years ago

  • Keywords has-patch removed

Good suggestion. About to commit.

#42 @djpaul
12 years ago

(In [6471]) Add support for 'online' type searches to BP_User_Query.
See #4060

#43 @DJPaul
12 years ago

  • Keywords has-patch added

4060-fix-friends.patch​ fixes a problem where the queries are incorrect for the member>friends>friendships and member>friends>request screens, resulting in no member list for either screen. This is recreatable with BP-Default or the Legacy template pack.

I am not sure if 4060-fix-friends.patch​ is the best way to fix this, so any testing and suggestions are very welcome.

#44 @DJPaul
12 years ago

4060-fix-friends.002.patch fixes further issues; looks like a NOT IN was copypasta from a previous clause.

#45 @djpaul
12 years ago

(In [6488]) Correct SQL for BP_User_Query for the Member>Friends/Friend Request screens. See #4060

#46 @DJPaul
12 years ago

Friends>Requests query is still incorrect.

BP-Default/theme compat. use bp_has_members('include=1,2,3,..') in the Requests template. BP_User_Query, because we're doing this on a user page, picks up the 'user' parameter which is automatically set somewhere, and appends the user's existing friends, too (which is pointless for the request page). The SQL looks like this:

`array(5) {

select?=>
string(42) "SELECT DISTINCT u.ID as id FROM wp_users u"
where?=>
array(2) {

[0]=>
string(17) "u.ID IN (26,24,9)"
[1]=>
string(24) "u.ID IN (8,25,12,3,2,23)"

}
orderby?=>
string(23) "ORDER BY u.display_name"
order?=>
string(3) "ASC"
limit?=>
string(11) "LIMIT 0, 20"

}`

I am assuming WP_User_Query joins the WHERE conditions together, so we end up with an impossible-to-match query.

#47 @johnjamesjacoby
12 years ago

  • Keywords reporter-feedback added; has-patch removed

What do we have left to do here to close this for 1.7?

#49 @boonebgorges
12 years ago

I'd been waiting to address some of the last pressing issues until I had a framework for writing some unit tests, but this hasn't happened. I'll have a look at the remaining issues for 1.7 within the next few days.

#50 @johnjamesjacoby
12 years ago

  • Owner set to boonebgorges
  • Status changed from new to assigned

#51 @boonebgorges
12 years ago

DJPaul - Good find. This is avoided in BP_Core_User::get_users() by only doing the friend filter when nothing is passed to the 'include' parameter: https://buddypress.trac.wordpress.org/browser/tags/1.6.2/bp-core/bp-core-classes.php#L266 I'm going to replicate the same check in BP_User_Query.

#52 @boonebgorges
12 years ago

(In [6638]) In BP_User_Query, only parse the 'user_id' param if 'include' is empty

Friend request queries work by passing a list of requests to the 'include'
parameter of bp_has_members(). This conflicts necessarily with doing a
'user_id' check in BP_User_Query, because 'user_id' limits results to the
passed user's friends (which are, by definition, different from his friend
requests). Thus, when an 'include' param is passed, the 'user_id' check should
be skipped.

This replicates the same logic in BP_Core_User::get_users().

See #4060

#53 @boonebgorges
12 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

It looks like the hard work has been done here. Thanks to all for your help conceptualizing and testing BP_User_Query. Please open new tickets for specific issues.

Note: See TracTickets for help on using tickets.