Opened 13 years ago
Closed 12 years ago
#4060 closed enhancement (fixed)
Improve performance of BP_Core_User::get_users()
Reported by: | shanebp | Owned by: | boonebgorges |
---|---|---|---|
Milestone: | 1.7 | Priority: | highest |
Severity: | critical | Version: | 1.2.9 |
Component: | Members | Keywords: | needs-testing reporter-feedback |
Cc: | matt_buddypress-trac@…, georgemamadashvili@… |
Description
Apologies if this shouldn't be a ticket...
We have ~25k users.
We’re still on BP Version 1.2.9 – (working on upgrading to 1.3)
I’m wondering if the issue below is addressed by 1.3 and/or if there is something we can do on 1.2.9
Here is an example of a slow query:
SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u LEFT JOIN wp_usermeta um ON um.user_id = u.ID WHERE u.user_status = 0 AND um.meta_key = 'last_activity' ORDER BY um.meta_value
I think this is coming from bp-core-classes -> function get_users
We’ve increased :
query_cache_size to 128M
query_cache_limit: to 2M
join_buffer_size to 2M
But we still have problems during high traffic.
Any suggestions on optimizing that query or some other approach ?
Attachments (10)
Change History (63)
#1
@
13 years ago
- Component changed from Core to Members
- Milestone changed from Awaiting Review to 1.6
#2
@
13 years ago
Thx DJP.
Attached is a slow-log snip showing the mentioned query and some of the queries that stack up behind it, based on timestamp.
#3
@
13 years ago
I'm no expert on queries, but perhaps this would help.
The SELECT DISTINCT u.ID... does a join on wp_usermeta looking for 'last_activity'.
For our site, wp_usermeta is over 1M rows.
Due to the structure of wp_usermeta, 'last_activity' cannot be indexed, correct ?
Then maybe 'last_activity' should be in a separate table for BP so that it can be indexed. Especially because you do an Order By on 'last_activity'.
Or harnessing memcache directly ?
[btw - in the initial entry I refer to BP 1.3, I meant BP 1.5]
#4
@
13 years ago
The more I look at this, the more I'm convinced that BP will _never_ scale while member last_activity is stored in wp_usermeta.
Moving last_activity into wp_users makes the most sense but would go against WP and make updating a mess. [ Any chance of convincing WP to make that change? lol ]
So I'm going to try changing update_user_meta to write to a separate table and point Select statements to that table. Still a bunch of work.
Any suggestions about this would be most welcome.
Groups->last_activity is not an issue even though we have ~600 groups.
#5
@
13 years ago
shanebp - Thanks for doing research. Would love to hear more.
Keep in mind that we already have an activity transaction table: wp_bp_activity. I know that there are already some indexing/speed issues with that table (see eg #4045), but if we could solve some of them, it would be nice to use a table that we already have. In this case, it would only add as many rows to the table as there are members, as you only need to store the *last* activity for a given user. (I imagine this would be done with a new activity type - user_last_activity, group_last_activity, etc.) Then, instead of joining against wp_bp_activity in BP_Core_User, we could consider doing a subquery.
In any case, it would be nice to do some benchmarks with, say, a million test users (current setup vs standalone table vs additional activity type in wp_bp_activity).
#6
@
13 years ago
boonegorges - seems like we're having a bit of an apples/oranges discussion.
For us, the main issue isn't joining on wp_bp_activity - it's the left join on wp_usermeta in BP_Core_User -> get_users();
For example, look at the Rows_examined in this snip from todays slow log:
# Query_time: 8.225675 Lock_time: 0.000083 Rows_sent: 28 Rows_examined: 74233 SET timestamp=1331480109; SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u LEFT JOIN wp_usermeta um ON um.user_id = u.ID WHERE u.user_status = 0 AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC LIMIT 0, 28;
And other simple queries start stacking up behind it.
Adding only as many rows as there are members is exactly the problem with using wp_usermeta to store 'last_activity' time-stamp.
A table only holding only last_activity [id, user_id, last_activity]
I think would be faster, very indexable and last_activity could be a date-time column instead of longtext - as used in wp_usermeta. And it would only have the same number of rows as wp_users.
I'm not sure I have the skills to do the benchmarking you mentioned - but it would be great to see it.
I cringe at the thought of adding a last_activity table. But we don't have much choice.
We switched from MyISAM to InnoDB, doubled the size of our db server, optimized our searches, etc. The result is that our page loads have stayed steady - but that's all, so it's not enough.
We have ~25k very active users; the average online number is 400, peaking ~1000.
Anyhow, thanks for your input.
#7
@
13 years ago
seems like we're having a bit of an apples/oranges discussion.
I don't think so. I'm just saying that we may be able to manage without creating a new table. The problem with the current last_activity implementation is not the join itself - it's the fact that (as you note) we ORDER BY meta_value, and the meta_value column is not indexed. The case is different with wp_bp_activity - we have a date_recorded column that *is* indexed. So putting the last_activity in that table would, in theory, be much faster than the current implementation in wp_usermeta.
The only reason I bring up the issue of #4045 is because my logic here depends on the assumption that your wp_bp_activity table is already in good shape. Obviously, moving from the wp_usermeta implementation to the wp_bp_activity implementation wouldn't do much good if your activity table queries are slow too.
This should be a fairly straightforward thing to test - we wouldn't really need to make any changes in BP for the benchmarking, just create the tables and try some different queries. I'm working on some related issues at the moment, so I will try to find some time to do it myself.
In the meantime, if you find that the wp_usermeta thing really is a bottleneck, then you could create your own last_activity table, and filter the BP_Core_User queries to join against your own table instead of wp_usermeta. Obviously this is a bit more work than fixing it in BP itself, but it may take some time to get it sorted out in BP.
#8
@
13 years ago
Ah - thanks for the explanation. Much appreciated.
I get the logic of your idea re wp_bp_activity and will look into it.
And if that idea leads to me submitting a ticket like #4045 in a few months... argh.
#9
@
13 years ago
- Keywords 1.7-early 2nd-opinion added
- Milestone changed from 1.6 to Future Release
- Severity changed from normal to critical
- Summary changed from slow queries in bp-core-classes to Improve performance of BP_Core_User::get_users()
- Type changed from defect (bug) to enhancement
Hi everyone. I've had a chance to do a bit of experimentation with my wp_bp_activity suggestion. It doesn't work out of the box, but if I do a bit of query restructuring, it works really beautifully. Here are some sample stats, using a database of about 1,000,000 activity items and 235,000 users.
*CURRENT BP TRUNK*
mysql> SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u LEFT JOIN wp_usermeta um ON um.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC LIMIT 0, 20; ------------+------------------------------------------------+---------------------+ 20 rows in set (11.31 sec)
mysql> SELECT COUNT(DISTINCT u.ID) FROM wp_users u LEFT JOIN wp_usermeta um ON um.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC; +----------------------+ | COUNT(DISTINCT u.ID) | +----------------------+ | 237511 | +----------------------+ 1 row in set (4.00 sec)
*MOVED LAST_ACTIVITY TO WP_BP_ACTIVITY, WITH SIMILAR QUERY STRUCTURE*
I wrote a script that rewrote last_activity usermeta to the activity table, with type = 'last_activity' and user_id = $user_id. Then I changed the JOIN so it was against activity instead of usermeta, but left the basic query structure the same.
mysql> SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, u.display_name, u.user_email , a.date_recorded as last_activity FROM wp_users u LEFT JOIN wp_bp_activity a ON a.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND a.type = 'last_activity' ORDER BY a.date_recorded DESC LIMIT 0, 20; ------------+------------------------------------------------+---------------------+ 20 rows in set (12.22 sec)
mysql> SELECT COUNT(DISTINCT u.ID) FROM wp_users u LEFT JOIN wp_bp_activity a ON a.user_id = u.ID WHERE u.spam = 0 AND u.deleted = 0 AND u.user_status = 0 AND a.type = 'last_activity' ORDER BY a.date_recorded DESC; +----------------------+ | COUNT(DISTINCT u.ID) | +----------------------+ | 237511 | +----------------------+ 1 row in set (4.07 sec)
As you can see, there is not any benefit from this move in itself, because the way in which the items are joined prevents MySQL from taking advantage of the date_recorded index.
*MOVED LAST_ACTIVITY TO WP_BP_ACTIVITY, AND CHANGED QUERY TO A SUBQUERY*
In order to take advantage of the date_recorded index, I tested with a subquery, with extremely promising results. Because the data is structured differently, is requires three queries instead of two (one to get the user IDs, one to get the user IDs, another to get the total count, and a third (this is the new one) to get the missing data out of wp_users that used to come from the join.
mysql> select distinct user_id from wp_bp_activity where user_id not in (select ID from wp_users where spam != 0 OR deleted != 0 OR user_status != 0) and type = 'last_activity' order by date_recorded desc limit 20; +---------+ 20 rows in set (0.01 sec)
mysql> select count(distinct user_id) from wp_bp_activity where user_id not in (select ID from wp_users where spam != 0 OR deleted != 0 OR user_status != 0) and type = 'last_activity'; +-------------------------+ | count(distinct user_id) | +-------------------------+ | 237511 | +-------------------------+ 1 row in set (4.15 sec)
mysql> SELECT ID, user_registered, user_nicename, user_login, display_name, user_email FROM wp_users WHERE ID IN (1,156563,31438,18959,115038,95468, 81552,217731,180316,50888,129014,214293,80522,16344,31461,9046 ,1035, 51964,176472,127686); ------------+------------------------------------------------+ 20 rows in set (0.00 sec)
Obviously, this is a dramatic improvement. Just for fun, I decided to try a similar subquery structure using the current last_activity data in wp_usermeta:
mysql> select distinct user_id from wp_usermeta where user_id not in (select ID from wp_users where spam != 0 OR deleted != 0 OR user_status != 0) and meta_key = 'last_activity' order by meta_value desc limit 20; +---------+ 20 rows in set (6.09 sec)
6.09 seconds is much faster than the 11-12 seconds for a join, but we are still forced to use filesort to do the DESC sort due to the lack of an index on wp_usermeta.meta_value, which is the real source of pain.
====
TAKEAWAYS:
Moving to subqueries for member queries has the potential for huge benefits, so I think we should pursue it. And I think that moving last_activity to the activity table is hugely beneficial as well, so we should also pursue that. (For group last_activity too, though usually groups are not such a pressure point since there are generally fewer of them.) But there are a couple of caveats that will make this a non-trivial task:
- The query methods will need to be rewritten for the three-query structure mentioned above, rather than the current two-query structure. This is straightfoward.
- We are currently using the very same query structure for all different "types" of user queries: last active, alphabetical, newest registered, online. 'last active' is the default, and 'online' is really a subset of 'last active', which is why I've focused on it above. Though this needs testing, I have a feeling that 'alphabetical' and 'newest registered' will need to be structured quite differently. 'newest registered' is pretty easy - wp_users has an index on user_registered, so we query that and then use a subquery or a second query to get last active data. There's a good chance that we're never going to be able to optimize 'alphabetical' without some major modifications, because we store that data in wp_users.display_name and wp_bp_xprofile_data.data, neither of which are indexed. Regardless, it will not work well with the wp_bp_activity subquery method.
- Backward compatibility. Do we keep 'last_activity' data mirrored in wp_usermeta? Generally mirroring data is bad, but (1) it's nice to leave it for plugins that depend on it (though it would be problematic for plugins that *modify* it, and (2) moving to the activity table will add a small amount of overhead for the loggedin/displayed user, since usermeta is loaded in the get_userdata() call, while the activity data will take a separate query. On balance, I think it's best to just move it, and to urge plugin authors to use the existing bp_core_record_activity() and bp_core_get_last_activity() functions, which need to be refactored of course.
- The COUNT queries are equally bad in all the scenarios above. There's no good way around this. So we should probably stop doing a live query for total member counts, store it as an option, and then bust and recalculate when new members join/become active or when accounts are deleted.
- Backward compatibility: We have filters on the sql queries, which will more or less break if the queries are radically restructured. I think it makes sense to remove the filters and add new ones; see https://core.trac.wordpress.org/ticket/18536#comment:14 and surrounding discussion for WP reasoning on a similar issue
- Moving last_activity to the activity tables is problematic because the Activity component is optional. If it's disabled, we don't have the tables, and we don't have the functions that write to the tables. So if we go this way, we either have to make Activity required, or maintain two different sets of query logic - one that uses Activity, and one that doesn't (which would, I guess, be the crappy version we have now). I'd rather go with making Activity required, but that will take a bunch of other kinds of work. An alternative approach is to store last_activity in a standalone table structured just like wp_bp_activity when Activity is disabled, which would still enable Activity to be turned off. The problem there is what would happen if someone decided to switch Activity on and off. Or we could just use a totally separate table, as originally suggested by shanebp, but that seems like a waste to me - we already have an activity table structure in place.
====
I'm moving this ticket to 1.7-early, because IMO it is going to require a large amount of work and especially user/plugin testing.
#10
@
13 years ago
Boone - thanks for all the work and info.
Do you think it's feasible to try and implement your query restructuring on a 1.2.9 install?
#11
@
13 years ago
Do you think it's feasible to try and implement your query restructuring on a 1.2.9 install?
Theoretically yes, but the problem is that the restructuring is not actually built yet :) I messed around a bit in the query class but I didn't actually write the logic to build the subqueries. I wrote them manually to do the benchmarks you see above. Writing that logic is going to take some time. Some changes will also need to be built elsewhere into BP, specifically wherever user activity is recorded and fetched. If you have some time to start a patch, it would be most welcome. Otherwise it will have to wait until I get some serious time to sit down with it.
On the bright side, these queries have not significantly changed between 1.2.9 and 1.5.x, so there shouldn't be a problem manually applying the patch. (However, your best bet is to get the heck off of 1.2.9! ;) )
#12
@
13 years ago
Boone - so your main reason for not moving last_activity to a new table is due to your aversion to creating a new table ? Granted wp_bp_activity already has a date_recorded column that *is* indexed, but wp_bp_activity can also get very big.
btw - I couldn't figure out how to implement your ideas above, so I decided to try the new table for last_activity approach. And got a hack working for last_activity, but it breaks too many other things, like search, due to the need for restructuring query logic - a task beyond me.
I'd be happy to track down wherever user activity is recorded and fetched elsewhere in BP.
I could deliver a list rather than a patch.
#13
@
13 years ago
Boone - so your main reason for not moving last_activity to a new table is due to your aversion to creating a new table ? Granted wp_bp_activity already has a date_recorded column that *is* indexed, but wp_bp_activity can also get very big.
More or less. All things being equal, we should avoid creating new tables. That's especially true in this case, since we literally already have a table devoted to recording *activity* items. It's true that size is a concern, but keep in mind that only a single last_activity item would be recorded at any time for a given member. So in a community with, say, 10,000 users, an "average" size activity table may have 500,000 rows (50 per user), but my proposal would only add 10,000 rows. That said, there is no question that activity query performance must be improved before my proposal can be seriously considered; see #4045.
I'd be happy to track down wherever user activity is recorded and fetched elsewhere in BP.
I could deliver a list rather than a patch.
Any little bit is helpful. Thanks :)
#14
@
13 years ago
Search result for 'last_activity' in BP 1.5.4 attached.
Please let me know if you want more info, different format, etc.
#16
@
12 years ago
Just a note - we recently added a tempfs mount (512MB)to our mysql server.
Maybe this is standard.
Anyhow, while it didn't 'solve' our issues with queries in question, it has definitely mitigated those issues.
#17
@
12 years ago
Boone -
The mysql guys at RackSpace looked at some of our queries - one of which started this ticket.
I thought I'd share their findings here so you and anyone else could comment on their suggestions.
Report:
I see two queries that could cause concurrency bottlenecks; both of these are showing up in the processlist many times per minute:
Query 1:
SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name FROM wp_bp_activity a USE INDEX (date_recorded) LEFT JOIN wp_users u ON a.user_id = u.ID WHERE a.user_id IN ( 28221 ) AND a.component IN ( 'album' ) AND a.item_id IN ( 47715 ) AND a.type != 'activity_comment' ORDER BY a.date_recorded DESC LIMIT 0, 20
This query performs a 0.7 second filesort and I see this query run many times. Removing the USE INDEX (date_recorded) portion of this query brought the total run time down to around 1 millisecond. I recommend removing this portion of the query from your codebase.
Query 2:
SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u LEFT JOIN wp_usermeta um ON um.user_id = u.ID WHERE u.user_status = 0 AND um.meta_key = 'last_activity' ORDER BY um.meta_value DESC LIMIT 0, 28
This query takes approximately 0.6 seconds to run -- 0.5 seconds to create a temporary table for the join and 0.1s to perform the ORDER BY um.meta_value DESC operation. Rewriting this query to limit the size of the order by improved the running time to approximately 0.17 seconds. I recommend using the following query in place of the above:
SELECT DISTINCT u.ID as id, u.user_registered, u.user_nicename, u.user_login, u.display_name, u.user_email , um.meta_value as last_activity FROM wp_users u INNER JOIN (SELECT user_id, meta_value FROM wp_usermeta WHERE meta_key = 'last_activity' ORDER BY meta_value DESC LIMIT 0, 28) AS um ON u.ID = um.user_id;
#18
follow-up:
↓ 19
@
12 years ago
shanebp - This is extraordinarily helpful. Thanks so much for passing it along.
The first query is related to activity index hints, covered in #4045. I have copied that part of your comment there, along with a question for you.
The second tip will probably become irrelevant when we do some of the planned refactoring. We'll no longer be joining wp_users against wp_usermeta in order to get the information; we'll be using properly indexed tables instead. No join + indexes means that there will be no need for temp tables.
Thanks again for your persistence on this. Your help has been crucial in tracking down the problematic queries.
#19
in reply to:
↑ 18
@
12 years ago
boone -
The second tip will probably become irrelevant...
But in the meantime - any reason not to use it ?
#20
@
12 years ago
But in the meantime - any reason not to use it ?
It's not wise at this point in the dev cycle to change this query structure, because it would break plugins that attempt to filter the query directly. You could filter the query yourself to accomplish what's being suggested above, of course.
#23
@
12 years ago
- Keywords 1.7-early removed
- Milestone changed from Future Release to 1.7
- Priority changed from normal to highest
TheBeardedOne - Thanks so much for sharing this document with us. It's extremely helpful.
This ticket is my personal #1 priority for this dev cycle. I have the beginnings of a patch in my local repo, which I'll share in the upcoming week or two.
#24
@
12 years ago
I've just closed #3127 in favor of this ticket. See that ticket for some interesting discussion of WP_User_Query, which will play a role in the patch I have to share with all of you soon.
#25
@
12 years ago
- Keywords has-patch needs-testing dev-feedback added
4060.01.patch is a first rough draft of overhauling our user queries.
First, a brief summary of the issues at hand. Our current user queries run through BP_Core_User::get_users(). The queries built by this method do not scale very well at all, due to a couple of fundamental issues:
- Directional joins against global tables. The query constructor always joins (at least) wp_users against wp_usermeta, and almost always joins against some BP tables too
- Directional joins against improperly indexed tables. wp_usermeta does not have an index on meta_value (which is used for last_activity - every single query - as well as a number of other query types)
- SELECT * and other greedy queries. With very large datasets, queries like SELECT * and other multiple-column SELECT statements increase the likelihood that temp tables will be required.
- Unflexible COUNT(*) queries. At a certain point, there's nothing you can really do to optimize these kinds of queries. Our current user query setup prevents admins from easily overriding them (with, eg, SQL_CALC_FOUND_ROWS and FOUND_ROWS()) or caching them or skipping them altogether
My proposed solution 4060.01 introduces BP_User_Query and deprecates BP_Core_User::get_users(). This solution eliminates some of the issues above, mitigates some of those that it can't eliminate outright, and has a couple of added benefits. I'll outline the general strategy and its benefits, and then pose some questions for feedback.
BP_User_Query is intentionally modeled on WP_User_Query, with similar method and property names, and a similar structure. This should make it easier for WP devs to approach. It improves our user queries in a couple of key ways. First and foremost, it splits our big multiple JOIN into several steps:
1) A custom query for getting matching IDs only. This parallels the split_the_query option recently introduced in WP_Query, for similar reasons: when your initial WHERE query (which does all the heavy lifting of filtering) only loads IDs, you are loading far less information into memory than if you do SELECT(*) or something like that, with the result that you have to resort far less to temporary tables.
2) Once we have the resulting user IDs, we use WP_User_Query to get some of the WP core data that BP needs (user_email, etc)
3) Finally, we use the user IDs to run another set of queries, to fetch the BP-specific data corresponding to 'populate_extras': friend counts, is_friend, last_activity, etc. Much of this data comes from wp_usermeta, and was previously fetched in separate standalone queries; I was able to refactor it to roughly halve the number of queries required for populate_extras.
Some sample preliminary results, on a dataset with ~113K rows in wp_users and ~1.35M rows in wp_usermeta:
TYPE: 'newest'
BP_Core_User::get_users
- Main query: 3,050ms
- Count query: 2,372ms
BP_User_Query
- Main query: 1,577ms
- Count query: 185ms
TYPE: 'alphabetical' (xprofile disabled or BP-WP profile sync enabled)
BP_Core_User::get_users
- Main query: 3,464ms
- Count query: 2,491ms
BP_User_Query
- Main query: 88ms
- Count query: 68ms
TYPE: 'alphabetical' (xprofile enabled and BP-WP profile sync disabled)
BP_Core_User::get_users
- Main query: 3,464ms
- Count query: 2,491ms
BP_User_Query
- Main query: 509ms
- Count query: 151ms
As you add more filters to the query (like search_terms), they get slower, of course. But the speed improvements are roughly linear to what you see above. As you can see, avoiding unnecessary JOINs results in huge improvements. Moreover, the restructuring means that BP tables are never joined against WP tables during user queries (except in one or two filter cases that I haven't gotten around to rewriting yet - see meta_key/meta_value).
A couple of questions and directions for further development:
- 'last_activity' - The current bottleneck of the whole setup is the 'last_activity' usermeta value. We use this to order and filter many of our user queries. But storing this data in wp_usermeta means that there's no index, so that our queries are quite slow (see my 'newest' numbers above). Moving this to a properly indexed transactional table would speed things up enormously. wp_bp_activity is the obvious choice, since it's built for this purpose. But such a move would mean that the BP Activity component could not be shut off (or would have to be split into two parts, so that the core couldn't be shut off), and it would also cause backward compatibility issues. For BP 1.7, I propose that we do *not* pursue any change along these lines - one thing at a time - but that we revisit for a future release.
- Backward compatibility - I have chosen to leave BP_Core_User::get_users() untouched, and to introduce BP_User_Query as a new class. That means that anyone continuing to use BP_Core_User::get_users() directly will see a deprecated notice, but their code will still work. The point at which I'm choosing to swap out the new class is in functions like bp_core_get_users(). (There are 6-8 other places throughout BP where a similar change will have to be made; I haven't done it yet, because I want to settle on a strategy first.) For the vast majority of uses, this changeover will be totally seamless. However, anyone who is directly filtering the get_users() sql queries ('bp_core_get_paged_users_sql', 'bp_core_get_total_users_sql') will be affected by the move to BP_User_Query, because these filters disappear. (There's simply no way to salvage them, given that the entire query structure is changed.) My proposed backpat strategy, as you can see in the patch, is to provide a 'bp_use_legacy_user_queries' filter, which takes the FUNCTION as a parameter, so that developers can opt to fall back on the old method in selected cases. What do you think of this strategy?
- Ongoing status of BP_Core_User - If we're no longer using BP_Core_User::get_users(), the only thing that BP_Core_User is really used for is instantiating a *single* user object. This, IMO, is a good thing - it more closely parallels the relationship between WP_User and WP_User_Query. However, as it stands, there is currently (both in the current codebase and my own patch) some reduplication of efforts. For example, it doesn't make sense for there to be two different ways for 'populate_extras' to be run, one for multiple users and one for single users. This makes for multiple points of failure. In the future it would be nice to abstract this somewhat. Perhaps BP_Core_User could be refactored somewhat so that it can be instantiated either for single user IDs (the current setup) or for multiple ones; then, instead of doing populate_extras and WP_User_Query business inside of BP_User_Query, the latter class would instead use BP_Core_User to call up the additional data required, once it'd come up with a list of user ids. This is similar to how WP_User_Query handles all_with_meta queries - it uses WP_User internally. This is probably not required for 1.7, but is a good direction for future parity with WP.
Whew. Sorry for all the words - I've thought a lot about this, and want to make sure it's done right, as it's a major step toward scaling BP. Feedback welcome.
#26
@
12 years ago
I'm a fan of the plan and the patch. I'm hoping to find some more time this dev cycle (and on) to star helping out more.
#28
@
12 years ago
Huge wins overall. Outstanding work.
Attached patch v2:
- Cleans up whitespace, comments, indentation, etc... (as if you expected anything else)
- Removes test cases for commented out bugs (I tend to agree with your assessment here)
- Decouples bp_is_active() checks from populate_extras() method, and moves them into the xprofile and friends components
- Adds some @todo's for other places bp_is_active() should be decoupled.
- Testing this on a small dataset, the queries are faster but it results in more of them (probably because of duplicate xprofile primary name field ID querying).
- I'm okay with the 'last_activity' behavior changing; it's something that comes up enough as unexpected where even though it's 'working as designed' it's still a poor design.
Overall, this is the direction we should go. I'm comfortable committing this in early, and iterating on it over time. Getting this in early allows for testing this with other user related queries to look for optimizations; friends, group members, activity streams, etc...
#30
@
12 years ago
Thanks for the cleanup, johnjamesjacoby.
I agree that getting rid of the bp_is_active() checks would be nice to get rid of. Most of them could probably be dropped filtering the query_vars before assembling the SQL - this will allow components to modify the 'include' and 'exclude' parameters using whatever queries they wish. Maybe we can look more closely at this once the base patch is committed - it's an optional enhancement for now, more for architectural elegance than any functional benefit.
the queries are faster but it results in more of them (probably because of duplicate xprofile primary name field ID querying).
Yes, and the fact that many JOINs were replaced with single table queries. This is case where more queries are *better*, because they're so much faster, and will be so much more cacheable by SQL cache.
I'm okay with the 'last_activity' behavior changing; it's something that comes up enough as unexpected where even though it's 'working as designed' it's still a poor design.
This is going to raise enormous problems for backward compatibility. I'll open a separate enhancement ticket where we can talk about it. I don't want that issue to hold up these improvements.
===
4060.03 fixes a couple of things:
- Pulls 'meta_key'/'meta_value' queries into a separate query, to avoid all joins against wp_usermeta and wp_users (woo hoo!)
- Implements the previously missing 'random' and 'popular' sort types
- Switches the queries fired by the Members widget to use populate_extras, so that they get all the data they need. This change is required by some of the changes in the way the queries are built (we're no longer scooping up all metadata on the first query)
IMO this is getting pretty close to being ready to rock.
#31
@
12 years ago
Boone,
I'm running a fairly large install(26k in wp_users, 1.9M in user_meta, 686k in bp_xprofile)
What kind of info would you like me to test against. Honestly I've never done much MySQL testing so let me know what would help.
We query users quite often by the admin staff. Our users cant query each other. I was using only wp_usermeta but the default search couldnt find them so I duplicate that data in bp_xprofile for the search tool.
-D
#32
@
12 years ago
DennisSmolek - Thanks for jumping in here. The best way to test would be to install trunk + this patch in a test environment (with a mirror of your live site's data) and do some simply mysql profiling. Maybe the easiest way to do this is to install the Debug Bar plugin, and then put define('SAVEQUERIES',true);
in your wp-config file. Then, on each pageload, you'll be able to see which queries were run, and how long they took. Use add_filter( 'bp_use_legacy_user_query', '__return_true' );
as a toggle to turn on/off the new query class, so you can get side-by-side results.
I wouldn't recommend putting this on your live site at the moment, as it's not been fully tested.
#33
@
12 years ago
(In [6314]) Introduces BP_User_Query for improved member query performance
- Introduces BP_User_Query class
- Deprecates use of BP_Core_User::get_users() in bp_core_get_users(), the main member query function in BuddyPress
- Introduces bp_use_legacy_user_query filter, to allow plugins to use legacy query method during transition
- Expands user query functionality to support 'user_ids' parameter, which can be used to skip the user_id query altogether, for maximum flexibility in plugins and themes
Props boonebgorges, johnjamesjacoby.
See #4060
#34
follow-up:
↓ 35
@
12 years ago
I was about to patch #4347, but realised it was in the deprecated get_users() function. I went looking for its new home, but the "online" query isn't built the same. Boone, was this intentional for performance reasons or is it an oversight?
#35
in reply to:
↑ 34
@
12 years ago
Replying to DJPaul:
I was about to patch #4347, but realised it was in the deprecated get_users() function. I went looking for its new home, but the "online" query isn't built the same. Boone, was this intentional for performance reasons or is it an oversight?
Looks like it was an oversight. If you'd like to add it, please do, otherwise I will when I get a few minutes.
#36
@
12 years ago
Member search is broken with the new BP_User_Query. 4060-fix-search.001.patch fixes part of the problem (invalid function argument).
The remaining problem is tougher as if it can't find any users matching the search details, it carries on with the user query as though no search term was passed. What should happen is that if there are no users matching the search, we should bail out and return zero matches. This is consistent with pre-BP_User_Query behaviour.
#37
@
12 years ago
- Keywords 2nd-opinion has-patch dev-feedback removed
Thanks for the patch, DJPaul. You're correct about the case where no users are found via search. I'll commit a fix in a moment.
#38
@
12 years ago
(In [6470]) Modifications to BP_User_Query to make member searches work correctly
- Don't pass invalid function parameter when doing user_id search. Props DJPaul
- Introduce $no_results property (borrowed from WP_Tax_Query)
- When a member search returns no results, ensure that the user query also returns no results
See #4060
#39
@
12 years ago
- Keywords has-patch added
4060-fix-online.001.patch tries to fix the "online" query. It works in my testing but as I'm not familiar with BP_User_Query, putting the patch up here for review.
#40
@
12 years ago
DJPaul - This looks good to me.
I know you were probably just porting over from the old version, but we might consider doing the DATE_ADD
operation on the UTC_TIMESTAMP()
instead of the meta_value
, since that way we only have to do it once instead of on every returned row. Or maybe this is better handled in a different ticket. Anyway, you should go ahead and commit your change.
#43
@
12 years ago
- Keywords has-patch added
4060-fix-friends.patch fixes a problem where the queries are incorrect for the member>friends>friendships and member>friends>request screens, resulting in no member list for either screen. This is recreatable with BP-Default or the Legacy template pack.
I am not sure if 4060-fix-friends.patch is the best way to fix this, so any testing and suggestions are very welcome.
#44
@
12 years ago
4060-fix-friends.002.patch fixes further issues; looks like a NOT IN was copypasta from a previous clause.
#46
@
12 years ago
Friends>Requests query is still incorrect.
BP-Default/theme compat. use bp_has_members('include=1,2,3,..') in the Requests template. BP_User_Query, because we're doing this on a user page, picks up the 'user' parameter which is automatically set somewhere, and appends the user's existing friends, too (which is pointless for the request page). The SQL looks like this:
`array(5) {
select?=>
string(42) "SELECT DISTINCT u.ID as id FROM wp_users u"
where?=>
array(2) {
[0]=>
string(17) "u.ID IN (26,24,9)"
[1]=>
string(24) "u.ID IN (8,25,12,3,2,23)"
}
orderby?=>
string(23) "ORDER BY u.display_name"
order?=>
string(3) "ASC"
limit?=>
string(11) "LIMIT 0, 20"
}`
I am assuming WP_User_Query joins the WHERE conditions together, so we end up with an impossible-to-match query.
#47
@
12 years ago
- Keywords reporter-feedback added; has-patch removed
What do we have left to do here to close this for 1.7?
#49
@
12 years ago
I'd been waiting to address some of the last pressing issues until I had a framework for writing some unit tests, but this hasn't happened. I'll have a look at the remaining issues for 1.7 within the next few days.
#51
@
12 years ago
DJPaul - Good find. This is avoided in BP_Core_User::get_users()
by only doing the friend filter when nothing is passed to the 'include'
parameter: https://buddypress.trac.wordpress.org/browser/tags/1.6.2/bp-core/bp-core-classes.php#L266 I'm going to replicate the same check in BP_User_Query
.
#52
@
12 years ago
(In [6638]) In BP_User_Query, only parse the 'user_id' param if 'include' is empty
Friend request queries work by passing a list of requests to the 'include'
parameter of bp_has_members(). This conflicts necessarily with doing a
'user_id' check in BP_User_Query, because 'user_id' limits results to the
passed user's friends (which are, by definition, different from his friend
requests). Thus, when an 'include' param is passed, the 'user_id' check should
be skipped.
This replicates the same logic in BP_Core_User::get_users().
See #4060
Moving to 1.6 for review along with #4041, #4045.