Skip to:
Content

BuddyPress.org

Opened 10 years ago

Closed 4 years ago

#4041 closed enhancement (maybelater)

Use SELECT FOUND_ROWS() for 'total' queries instead of a second query

Reported by: boonebgorges Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Performance Keywords: trac-tidy-2018
Cc: DJPaul

Description

Throughout BP, we do two queries for content loops: one that is paginated (LIMIT) and one that does COUNT(*) for a total count. The second one is unnecessary; we should be using SELECT FOUND_ROWS() instead, to avoid a second, potentially costly, query.

Note that this will cause backpat problems in the case of filters on _total_ queries. I recommend the following course of action:

  • Remove the _total_ filters altogether
  • Mark the _paged_ filters as deprecated (in the inline docs)
  • Add new filters that make no reference to _paged_ or _total_ since it will no longer be relevant.

Attachments (1)

bp-activity-classes.php.patch (915 bytes) - added by webraket 10 years ago.

Download all attachments as: .zip

Change History (17)

#1 @johnjamesjacoby
10 years ago

  • Component changed from Core to All Components

Fine by me.

#2 @DJPaul
10 years ago

  • Owner set to DJPaul
  • Status changed from new to assigned

#3 @webraket
10 years ago

I recently experienced bad results using SELECT FOUND_ROWS().

SQL_CALC_FOUND_ROWS seems to trigger a filesort when ORDER BY is used.
This behavior should be fixed in mysql > 5.6.2, http://bugs.mysql.com/bug.php?id=18454

For now I think it's better/faster to just keep the 2nd query.
The ORDER BY and SORT parts should be removed however, since it's slow(er) and useless.

#4 @boonebgorges
10 years ago

webraket - Thanks for the pointer. WordPress uses SELECT FOUND_ROWS() in WP_Query - have you found similar performance issues?

I would like to benchmark this before moving forward with any broad changes. Someone should write a script that will write a few million records into the activities table (I will do it this weekend if I get a chance) and then test webraket's patch for the activity query against SELECT FOUND_ROWS(). If they are close or the same, we should go with SELECT FOUND_ROWS() (all things being equal, it's better to do what WP does, better to hand work off to the DB, better to have fewer queries). But if we find that webraket's results are reproducible, then we should probably think about keeping what we currently have in place, at least for components (like Activity) where the table is liable to get very, very, very large.

#5 @webraket
10 years ago

When I used SQL_CALC_FOUND_ROWS, that query ended up multiple times in slow-query.log
The avg. server-load spiked around 4.0

# Query_time: 20.422221  Lock_time: 0.000089 Rows_sent: 20  Rows_examined: 1836712
SELECT SQL_CALC_FOUND_ROWS a.*, u.user_email, u.user_nicename, u.user_login, u.display_name  FROM wp_bp_activity a LEFT JOIN wp_users u ON a.user_id = u.ID WHERE a.hide_sitewide = 0 AND a.type != 'activity_comment' ORDER BY a.date_recorded DESC LIMIT 0, 20;

# Query_time: 13.388473  Lock_time: 0.013897 Rows_sent: 20  Rows_examined: 1836778
SELECT SQL_CALC_FOUND_ROWS a.*, u.user_email, u.user_nicename, u.user_login, u.display_name  FROM wp_bp_activity a LEFT JOIN wp_users u ON a.user_id = u.ID WHERE a.hide_sitewide = 0 AND a.type != 'activity_comment' ORDER BY a.date_recorded DESC LIMIT 0, 20;

1.8million rows examined seems way too much for 600.000 activity items I currently have.
Maybe SQL_CALC_FOUND_ROWS performs the wp_user JOIN for al activity items it needs to count?

Phpmyadmin shows "using where;using filesort;" with EXPLAIN SELECT...
Once I remove (SQL_CALC_FOUND_ROWS) or (ORDER BY a.date_recorded DESC) it only shows "using where;"

I went back to the count() query, resulting in server-load around 0.5 - 1.0 ;)
It's still not optimal however:

# Query_time: 15.334062  Lock_time: 0.000034 Rows_sent: 1  Rows_examined: 613581
SELECT count(a.id) FROM wp_bp_activity a WHERE a.hide_sitewide = 0 AND a.type != 'activity_comment';

PS: I also see the WP_QUERY using SQL_CALC_FOUND_ROWS in slow-query.log
Many Wordpress users reported problems with it:
http://core.trac.wordpress.org/ticket/10469
http://core.trac.wordpress.org/ticket/10964

# Query_time: 10.427559  Lock_time: 0.106090 Rows_sent: 10  Rows_examined: 90
SELECT SQL_CALC_FOUND_ROWS  wp_12958_posts.* FROM wp_12958_posts  WHERE 1=1  AND wp_12958_posts.post_type = 'post' AND (wp_12958_posts.post_status = 'publish')  ORDER BY wp_12958_posts.post_date DESC LIMIT 0, 10;

# Query_time: 12.887792  Lock_time: 0.020614 Rows_sent: 10  Rows_examined: 12
SELECT SQL_CALC_FOUND_ROWS  wp_23379_posts.* FROM wp_23379_posts  WHERE 1=1  AND wp_23379_posts.post_type = 'post' AND (wp_23379_posts.post_status = 'publish')  ORDER BY wp_23379_posts.post_date DESC LIMIT 0, 10;

NOTE: These test results come directly from a busy production server.
Someone should test it in a local environment to ensure query's are not locking up each other to get the best measurements possible.

#6 @boonebgorges
10 years ago

webraket - Yikes, those are quite the numbers. THanks very much for sharing.

#7 @cnorris23
10 years ago

The ticket with an almost complete solution has ended up at #WP18536. The solution was to split things into two queries. It's not complete as jjj reported a bug, but scribu has a patch up to solve that.

#8 @boonebgorges
10 years ago

  • Milestone changed from 1.6 to Future Release

Thanks, cnorris23. I had seen that ticket but had not put two and two together as regards this one.

I've done a bit of testing on my own setups with massive data sets and have found results similar to what webraket has suggested.

In light of this, I want to put this off until the next dev cycle, at which point we can revisit and see if there's something from WP's (finished) version that we can riff off of in our own queries.

#9 @DJPaul
9 years ago

  • Owner DJPaul deleted

#10 @johnjamesjacoby
7 years ago

  • Component changed from Component - Any/All to API - WordPress

#11 @DJPaul
6 years ago

  • Component changed from API - WordPress to Core

#12 @r-a-y
6 years ago

  • Component changed from Core to Performance
  • Type changed from defect (bug) to enhancement

The second one is unnecessary; we should be using SELECT FOUND_ROWS() instead, to avoid a second, potentially costly, query.

WP already does a count of sorts with $wpdb->num_rows for the last query:
https://codex.wordpress.org/Class_Reference/wpdb#Class_Variables

If this works, this sounds like a good performance enhancement!

#13 @DJPaul
5 years ago

  • Milestone changed from Future Release to 2.7

Tempted to see if we can do this for 2.7.

#14 @boonebgorges
5 years ago

  • Milestone changed from 2.7 to Future Release

#15 @DJPaul
4 years ago

  • Keywords trac-tidy-2018 added

We're closing this ticket because it has not received any contribution or comments for at least two years. We have decided that it is better to close tickets that are good ideas, which have not gotten (or are unlikely to get) contributions, rather than keep things open indefinitely. This will help us share a more realistic roadmap for BuddyPress with you.

Everyone very much appreciates the time and effort that you spent sharing your idea with us. On behalf of the entire BuddyPress team, thank you.

If you feel strongly that this enhancement should still be added to BuddyPress, and you are able to contribute effort towards it, we encourage you to re-open the ticket, or start a discussion about it in our Slack channel. Please consider that time has proven that good ideas without contributions do not get built.

For more information, see https://bpdevel.wordpress.com/2018/01/21/our-awaiting-contributions-milestone-contains/
or find us on Slack, in the #buddypress channel: https://make.wordpress.org/chat/

#16 @DJPaul
4 years ago

  • Milestone Awaiting Contributions deleted
  • Resolution set to maybelater
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.