Skip to:
Content

BuddyPress.org

Opened 12 years ago

Closed 10 years ago

#4045 closed defect (bug) (duplicate)

Activity Feed queries fail with millions of rows

Reported by: zacechola's profile zacechola Owned by:
Milestone: Priority: normal
Severity: normal Version: 1.5.4
Component: Activity Keywords: needs-patch
Cc: matt_buddypress-trac@…

Description

We have an activity feed that contains millions of entries that are stored with the newest entry at the end of the table.

When BP tries to load the activity feed it selects the entire activity table and then reverse sorts it.

This requires that the table be written out to disk.

On a site as busy are ours, this results in a large number of queries all doing the same thing which brings MySQL down.

We've tested out selecting smaller sized date ranges and then reverse sorting those, which has helped, but, it's not necessarily ideal.

Attachments (7)

4045.01.diff (2.0 KB) - added by boonebgorges 12 years ago.
activity_feed.txt (110.2 KB) - added by zacechola 12 years ago.
processlist
Screen Shot 2012-03-13 at 10.39.30 AM.png (25.8 KB) - added by zacechola 12 years ago.
mysql
4045.02.diff (2.2 KB) - added by boonebgorges 12 years ago.
4045.03.diff (9.5 KB) - added by boonebgorges 12 years ago.
explain_shanebp.gif (32.1 KB) - added by shanebp 12 years ago.
PerconaQueryOptimizations.pdf (113.7 KB) - added by TheBeardedOne 12 years ago.
Query Optimizations that was done by Percona via consulting contract

Download all attachments as: .zip

Change History (57)

#1 @johnjamesjacoby
12 years ago

  • Keywords reporter-feedback added
  • Milestone changed from Awaiting Review to 1.6

When you say feed, do you mean RSS feed?

#2 @zacechola
12 years ago

No. I mean activity. Either on the activity page itself or on the user profiles.

As the site grows, the current solution doesn't seem to scale very well because it tries to pull the entire table, reverse sort it, and then apply any filters. When this table is huge, there needs to be some checks in place to limit the size of the query.

#3 @zacechola
12 years ago

  • Keywords reporter-feedback removed

#4 @shanebp
12 years ago

We also have problems with
SELECT a.*, u.user_email,...
in
bp-activity-classes.php

The only temp solution was to stop recording things like 'friendship_created' and prune all prior entries.

#5 @boonebgorges
12 years ago

Hi gang. I am looking into this at the moment, and hope to have more details soon. In the meantime, I wanted to share a tool I wrote that allows you to create huge amounts of dummy data for a dev installation: https://github.com/boonebgorges/bp-mega-populate (Those of you who already have lots of data to play with don't need this, I suppose.)

#6 @boonebgorges
12 years ago

  • Keywords 2nd-opinion has-patch added

Hi all. I have done a bunch of digging and experimentation, and I have made some progress.

First, I have found that the real problem arises on the count() queries that get the total_activities values. That's because the paged query indexes properly on date_recorded. I found that it scales pretty well up to several million rows. The total_activities query, on the other hand, is really dumb about the way it selects an appropriate index.

I've written a small patch that takes the following approach: it defines a preferred order of indexes, and then provides an index hint accordingly. So, for example, on the main Activity directory, the query optimizer is deciding to index by hide_sitewide, which hardly cuts down the results at all; with my patch, it prefers type (as in type != 'activity_comment'). Likewise, on single group pages, it prefers item_id, and on single user pages it prefers user_id.

zecechola and shanebp, I'm especially interested to hear if you have any luck with the patch (or suggestions for other courses of action). If this route seems promising, I think there are even more progressive changes that we can make; but this would make for a nice start.

#7 @zacechola
12 years ago

Excellent. We'll give it a shot this week and see how it goes.

#8 @zacechola
12 years ago

Ok. We've got some real world testing of the patch now. First, it's a big improvement, but we're still running into issues whenever the queries aren't in cache. Specifically, we think this query is causing some problems:

SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name  
FROM [redacted]_bp_activity a LEFT JOIN [redacted]_users u ON a.user_id = u.ID 
WHERE a.type IN ( 'new_blog_post' ) AND a.hide_sitewide = 0 
AND a.type != 'activity_comment' 
ORDER BY a.date_recorded 
DESC LIMIT 0, 2

Reverse sorting the date_recorded field requires that the entire contents of the table be written out to a tmp table and then read back *prior* to the LIMIT being applied.

I'm attaching processlist logs and and some mysql info. There is swap space shown as used but the disk wait time is 0%, indicating that swap space isn't actively being used. It's most likely caused by ntp or some other caching service on the system (known issues).

@zacechola
12 years ago

processlist

#9 @boonebgorges
12 years ago

Thanks for the preliminary feedback, zacechola. Glad to hear that the indexing trick helps.

I'm going to dig more into the ORDER BY date_recorded issue later today (hopefully), but first I was hoping to get some more info from you:

  • Would you mind running an EXPLAIN query on the problematic query? I'm a bit confused as to why it's loading the entire table into memory when it should be using the date_recorded index
  • Do you find similar problems for all activity queries, or is it specifically those with a.type IN ( 'new_blog_post' )? Is there anything special to know about the distribution of different 'type' values in your activity database (like, is it mostly new_blog_post, or is it a pretty good distribution of many types)? SHOW INDEX FROM [redacted]_bp_activity would be helpful, so I can see cardinality, etc.

#10 @zacechola
12 years ago

All the info is provided below, but essentially it takes about 7 seconds to build the count(*) and 12 seconds to run the other query. So, if both are being called during a sudden burst of update activity on the site, the query cache busts and we run into problems with the slow query.

So, basically, as long as the activity is evenly spread out, we do OK with the patch, but as soon as we see a spike in updates, even if small, we're in trouble.

Anyway, looking a bit closer, it doesn't appear that it is actually being written to disk, but the sorting process does take a long time. If the amount of data to be sorted was paired down it would be useful.

Explain results, bp_activity

mysql> 
EXPLAIN SELECT count(*) 
FROM [redacted]_bp_activity a 
USE INDEX (type) WHERE a.type IN ( 'new_blog_post' ) 
AND a.hide_sitewide = 0 
AND a.type != 'activity_comment'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ref
possible_keys: type
          key: type
      key_len: 227
          ref: const
         rows: 2105025
        Extra: Using where
1 row in set (0.00 sec)

Query results

mysql> 
SELECT count(*) 
FROM [redacted]_bp_activity a USE INDEX (type) 
WHERE a.type IN ( 'new_blog_post' ) 
AND a.hide_sitewide = 0 
AND a.type != 'activity_comment'\G
*************************** 1. row ***************************
count(*): 2696717
1 row in set (7.14 sec)

Explain Results, user

mysql> 
EXPLAIN SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name  
FROM [redacted]_bp_activity a 
LEFT JOIN [redacted]_users u ON a.user_id = u.ID 
WHERE a.type IN ( 'new_blog_post' ) 
AND a.hide_sitewide = 0 
AND a.type != 'activity_comment' 
ORDER BY a.date_recorded 
DESC LIMIT 0, 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: ref
possible_keys: type,hide_sitewide
          key: hide_sitewide
      key_len: 2
          ref: const
         rows: 1919104
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: [redacted].a.user_id
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

Query Results

mysql> 
SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name  
FROM [redacted]_bp_activity a 
LEFT JOIN [redacted]_users u ON a.user_id = u.ID 
WHERE a.type IN ( 'new_blog_post' ) AND a.hide_sitewide = 0 
AND a.type != 'activity_comment' 
ORDER BY a.date_recorded 
DESC LIMIT 0, 2\G
*************************** 1. row ***************************
[redacted]

2 rows in set (12.84 sec)

Activity indexes:

mysql> show index from [redacted]_bp_activity\G
*************************** 1. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 1
 Column_name: id
   Collation: A
 Cardinality: 3340434
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
*************************** 2. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: date_recorded
Seq_in_index: 1
 Column_name: date_recorded
   Collation: A
 Cardinality: 3340434
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
*************************** 3. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: user_id
Seq_in_index: 1
 Column_name: user_id
   Collation: A
 Cardinality: 10153
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
*************************** 4. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: item_id
Seq_in_index: 1
 Column_name: item_id
   Collation: A
 Cardinality: 14
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
*************************** 5. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: secondary_item_id
Seq_in_index: 1
 Column_name: secondary_item_id
   Collation: A
 Cardinality: 1113478
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment: 
*************************** 6. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: component
Seq_in_index: 1
 Column_name: component
   Collation: A
 Cardinality: 14
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
*************************** 7. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: type
Seq_in_index: 1
 Column_name: type
   Collation: A
 Cardinality: 14
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
*************************** 8. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: mptt_left
Seq_in_index: 1
 Column_name: mptt_left
   Collation: A
 Cardinality: 14
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
*************************** 9. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: mptt_right
Seq_in_index: 1
 Column_name: mptt_right
   Collation: A
 Cardinality: 14
    Sub_part: NULL
      Packed: NULL
        Null: 
  Index_type: BTREE
     Comment: 
*************************** 10. row ***************************
       Table: [redacted]_bp_activity
  Non_unique: 1
    Key_name: hide_sitewide
Seq_in_index: 1
 Column_name: hide_sitewide
   Collation: A
 Cardinality: 14
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment: 
10 rows in set (0.23 sec)

Process list:

*************************** 4. row ***************************
     Id: 18687045
   User: root
   Host: localhost
     db: [redacted]
Command: Query
   Time: 4
  State: Sorting result
   Info: SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name  
FROM [redacted]_bp_activity a 
LEFT JOIN [redacted]_users u ON a.user_id = u.ID 
WHERE a.type IN ( 'new_blog_post' ) 
AND a.hide_sitewide = 0 
AND a.type != 'activity_comment' 
ORDER BY a.date_recorded 
DESC LIMIT 0, 2

PRODUCES 2 rows in set (15.92 sec)

#11 @shanebp
12 years ago

Not sure that this is useful info...

I used your populate tool on a dev install of BP 1.5.3.1, ending up with 1M rows.

After installing your patch, calling .com/activity/ was 30-50% faster for page load.

I could not get an entry in slow query log.

Wish I could try it on our production server, but we're still on BP 1.2.9

#12 @boonebgorges
12 years ago

zacechola - Thanks so much for that info. It's really helpful. That said, I am a bit confused. Your SHOW INDEX results show an index on date_recorded, yet the query optimizer is not even identifying date_recorded as one of the possible_keys. That's why the paged query is going so slowly.

I must admit I'm in a bit over my head here - I'm not sure why MySQL would be acting like this. But please do try 4045.02.diff, which is pretty much the same thing as 4045.01.diff, except that in the _paged_ query, USE INDEX (date_recorded) is specified. (Note that the patch had to be redone a bit to apply to some changes to trunk from this afternoon. Not sure if it will apply cleanly to a 1.5.4 install, but in any case it's easy enough to apply manually.) I am not sure whether MySQL will obey, given that it doesn't seem to recognize it as a possible key, but it's worth a try. (You might also try FORCE INDEX (date_recorded).) You could also consider running ANALYZE TABLE on your activity table, to see if it jogs MySQL's memory.

I'm going to do some more analysis to see why we are joining the users table here. At a glance, it doesn't look necessary. I don't think it's at the root of your extreme slowness, but every little bit helps.

shanebp - Thanks for the data point :) If you're feeling brave, you could apply the changes to your 1.2.9 installation manually - the logic in the ::get() method is not hugely change. You'd need to add the index-selection block above line 131 in this file https://buddypress.trac.wordpress.org/browser/tags/1.2.9/bp-activity/bp-activity-classes.php#L123, and then change line 131 so that it includes {$index_hint_sql} right after {$from_sql}. You can also apply the date_recorded index manually, to line 95: https://buddypress.trac.wordpress.org/browser/tags/1.2.9/bp-activity/bp-activity-classes.php#L93. See 4045.02.diff for a guide (and BE CAREFUL! :) )

#13 @cnorris23
12 years ago

FWIW, this is pretty much the exact same issue as addressed in #WP18536. The solution there was to use two queries. One get_col query like

SELECT a.id
FROM bp_activity a 
WHERE a.type IN ( 'new_blog_post' ) AND a.hide_sitewide = 0 
AND a.type != 'activity_comment' 
ORDER BY a.date_recorded 
DESC LIMIT 0, 2

to get just the ids. From there we would run a second query to fill the objects using an added IN statement based on the ids we retrieved. They also through in some clever caching.

#14 @boonebgorges
12 years ago

Thanks for drawing the connection, cnorris23. See also #4060, where I make a two-query suggestion for the members query.

We would need some benchmarking to see if the two-query solution would help in the case of activity. In my tests, it's *not* the LIMIT query that is causing performance problems - they run reasonably fast when using the correct indexes - but the COUNT query. Doing a separate get_col() query wouldn't help much with that. (Though I would be happy to be shown wrong.)

#15 @mym6
12 years ago

Ok, I've been working zacechola to make this work. Turns out that you should tell it which index to *ignore* rather than which index to use. This ensure's that the proper index is always picked.

SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name  
FROM [redacted]_bp_activity a IGNORE INDEX (type,hide_sitewide) 
LEFT JOIN [redacted]_users u ON a.user_id = u.ID 
WHERE a.hide_sitewide = 0 AND a.type != 'activity_comment' 
ORDER BY a.date_recorded DESC 
LIMIT 0, 20\G

Will produce results virtually immediately.

Explain on this will show

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
         type: index
possible_keys: NULL
          key: date_recorded
      key_len: 8
          ref: NULL
         rows: 3829415
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: [redacted] a.user_id
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

This doesn't fix the count(a.id) query though.

#16 @boonebgorges
12 years ago

mym6 - Thanks for the results. I'm glad to have it confirmed that this is, in fact, an index problem.

This all still doesn't explain why (1) the optimizer is making such a large error to begin with; (2) date_recorded is not showing up in your possible_keys; and (3) why USE INDEX is not enough to make MySQL use the index. Did you try FORCE INDEX in place of USE INDEX in my original patch?

This doesn't fix the count(a.id) query though.

What do you mean by "this" and what do you mean by "fix"? Do you mean that you tried IGNORE INDEX on your COUNT query, and it's still slow? (zacechola said above that my original query hint patch helped a lot, though maybe I misinterpreted https://buddypress.trac.wordpress.org/ticket/4045#comment:8) The following query should, for optimal speed, be indexed by type:

SELECT count(*) 
FROM [redacted]_bp_activity a
WHERE a.type IN ( 'new_blog_post' ) 
AND a.hide_sitewide = 0 
AND a.type != 'activity_comment'

Are you saying that indexing by type doesn't actually make the query faster than letting the optimizer work, or that using IGNORE INDEX doesn't improve over USE INDEX?

In any case, the COUNT(*) query is never going to run as fast as the LIMIT query. It's going to need a different kind of optimization, and probably some selective caching by BP itself. I'm going to work on that next.

#17 @mym6
12 years ago

I did not use a FORCE INDEX, that might have also worked I suppose. And yes, forcing the index on the count query doesn't help.

The optimizer is an odd thing. I tried doing a query where I asked for at the MAX(date_recorded) and then asked for 50 days back from there, sorting the result set and then doing the limit. It was very quick up to about 60 days back, then it would start using the wrong index. See:

EXPLAIN SELECT SQL_NO_CACHE a.*, u.user_email, u.user_nicename, u.user_login, u.display_name    
FROM [redacted]_bp_activity a   
LEFT JOIN [redacted]_users u ON a.user_id = u.ID   
WHERE a.type IN ( 'new_blog_post' )    
  AND a.hide_sitewide = 0     
  AND a.type != 'activity_comment'    
  AND a.date_recorded      
  BETWEEN  DATE_SUB((SELECT MAX(a.date_recorded)  
                     FROM [redacted]_bp_activity a), 
           INTERVAL 60 DAY)      
    AND (SELECT MAX(a.date_recorded)               
         FROM [redacted]_bp_activity a)       
ORDER BY a.date_recorded,a.type  
DESC LIMIT 0, 20\G

Would result in

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: range
possible_keys: date_recorded,type,hide_sitewide
          key: date_recorded
      key_len: 8
          ref: NULL
         rows: 489780
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: [redacted].a.user_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
*************************** 4. row ***************************
           id: 2
  select_type: SUBQUERY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
4 rows in set (0.00 sec)

But the following query:

EXPLAIN SELECT SQL_NO_CACHE a.*, u.user_email, u.user_nicename, u.user_login, u.display_name    
FROM [redacted]_bp_activity a   
LEFT JOIN [redacted]_users u ON a.user_id = u.ID   
WHERE a.type IN ( 'new_blog_post' )    
  AND a.hide_sitewide = 0     
  AND a.type != 'activity_comment'    
  AND a.date_recorded      
  BETWEEN  DATE_SUB((SELECT MAX(a.date_recorded)  
                     FROM [redacted]_bp_activity a), 
           INTERVAL 70 DAY)      
    AND (SELECT MAX(a.date_recorded)               
         FROM [redacted]_bp_activity a)       
ORDER BY a.date_recorded,a.type  
DESC LIMIT 0, 20\G

Would result in this

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: a
         type: ref
possible_keys: date_recorded,type,hide_sitewide
          key: type
      key_len: 227
          ref: const
         rows: 1914714
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: u
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: [redacted].a.user_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 3
  select_type: SUBQUERY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
*************************** 4. row ***************************
           id: 2
  select_type: SUBQUERY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Select tables optimized away
4 rows in set (0.01 sec)

#18 @boonebgorges
12 years ago

That is really weird.

I hope that others with large installations and activity-query slowness can chime in, so that we can narrow down whether this is a peculiarity in your system, or if this is somehow BP's fault. I'm wary of implementing anything as extreme as FORCE INDEX or IGNORE INDEX in BP if it's not necessary to do so.

#19 @mym6
12 years ago

I agree, I'm interested to hear if this something I've setup wrong on our database servers or what.

#20 @boonebgorges
12 years ago

Another solution we might explore (in addition to, not in place of, some of the suggestions above) is to introduce a count_total toggle to bp_has_activities(), which would allow very large sites to opt out of doing the total_activities query altogether. Since activity pagination works (in bp-default) with an AJAX 'Load More' button, we don't need exact counts.

#21 @boonebgorges
12 years ago

(In [5940]) Provide index hints for queries in BP_Activity_Activity::get(). See #4045.

#22 @boonebgorges
12 years ago

I've put the index hints (4045.02) in BP trunk. Keeping this ticket open as there are several other avenues I would like to pursue here.

#23 @cnorris23
12 years ago

I did some testing of different scenarios over the weekend, and here's what I found.

Test database was about 1,060,000 activity rows.

Test servers
1) local xampp
2) standard shared hosting
3) low-mid range VPS

  • All queries run with SQL_NO_CACHE

Indexes added the most benefit, especially on the count query. Indexing alone cut the count by half across all testing environments. Using USE INDEX (PRIMARY,date_recorded) rather than just USE INDEX (date_recorded) produced even better results, although the most benefit came on the count query.

I also tested selecting ids first, running a second query to fill in the objects based on those ids. The purpose of this would be to prevent the nasty filesort seen by mym6. All my test servers had plenty of memory, so this was difficult to test. The shared server offered me no ability to change settings. The VPS server was a production server, so I didn't want to do much there. I eventually tricked MySQL on my xampp server into thinking it was memory starved. I was able to reproduce the filesort, and the two query solution fixed this. However, across all environments, this was slower than the single query we have now, even with USE INDEX. But by slower, it was consistently, across all platforms, only .5-.8 microseconds (.0005-.0008 seconds) slower.

As much as I'd like to use the SQL_COUNT_FOUND_ROWS/SELECT FOUND_ROWS() combo, it produced varying results, the vast majority of them being 2-3 times slower than the count query we have now.

@boonebgorges I think making the count query optional, or caching it in a transient would be a good solution. While researching, this was a solution thrown out multiple times. As you said, the counts don't really matter, especially when you reach thousands, or in the case of this ticket, millions, of rows.

#24 @boonebgorges
12 years ago

cnorris23 - Thanks so much for the feedback. It's good to know that your testing more or less duplicated what I found in my own.

Agreed about SELECT FOUND_ROWS(). My research showed that the performance of this function (much like COUNT() itself) is a general limitation in MySQL. In the end, actually, I found that there may just be a theoretical limit to how fast these COUNTs can be in general, because in the end there's no way to do them (with multiple WHERE clauses) without doing some sort of table scan. I even did some testing where I broke every WHERE clause into a subquery, in the hopes that it would take fuller advantage of indexing, but the performance differences seem to be pretty negligible.

Making the count optional is pretty straightforward. The problem there is that you almost have to default to count=true, or you'll break existing implementations; and that means that people won't really experience the benefits unless they modify their theme files. I'm considering maybe introducing a BP corollary to wp_is_large_network(), which will detect (based on a simple activity row count, which is very fast in MyIASM) whether you have a huge number of items in your activity table, and then defaults to count=false accordingly (with relevant filters in place to override, of course). Anyway, this needs some experimentation. Would be happy to hear your feedback.

There are some very tricky issues related to stashing stuff in transients. See https://buddypress.trac.wordpress.org/ticket/4076. And in the end, regenerating these counts is going to be brutal on large databases even if it's only done once in a while - you can easily get into multiple *minutes* with large enough datasets. It would be pretty slick if there were a way to bust the transients asynchronously, so that it didn't interfere with pageload. Maybe hook something to WP's shutdown hook? Or do it with AJAX?

#25 @mym6
12 years ago

I realized that I've left out our server config information, if its worth anything.

We have a master/slave setup, both with 12 cores with HT, 32GB of ram with about 9GB dedicated to the innodb_buffer_pool. Our activity table has actually ben converted to InnoDB though it would be possible to convert it back. We saw this same issue (if not worse due to locking) when using MyISAM.

I've been running the queries I provided for nearly a week and haven't had a single issue.

#26 @cnorris23
12 years ago

@mym6 You may need to increase your sort_buffer_size, . This is the setting I had to tweak to force the filesort.

#27 @boonebgorges
12 years ago

(In [5953]) Revert part of r5490, which suggests a date_recorded index for the main activity query. This causes problems when viewing single item activity streams. See #4045

#28 @cnorris23
12 years ago

FWIW, when I was doing my earlier testing, I started with just PRIMARY_KEY, which, in this case, would be the activity id. This produced similar results to date_recorded. I haven't yet tested this, but it seems like this wouldn't cause issues with single activity items.

#29 @boonebgorges
12 years ago

OK, I have worked up a patch that allows for disabling activity counts. 4045.03.patch does the following:

  • Introduces a do_total_count parameter to the bp_has_activities() stack. It defaults to true. When set to false, the COUNT query will be skipped, and the total key (where the count usually goes) will be set to -1.
  • Introduces a 'bp_activity_allow_total_count' filter, which allows admins/plugins to override the do_total_count filter in a global way. (It's not strictly necessary to have the parameter *and* this filter, but for testing it makes it slicker.)
  • Reworks the pagination links code in BP_Activity_Template, so that noscript pagination works (minimally) when the total activity count is -1. Note that my solution here is not really beautiful (swap out the numbers for Previous and Next links), but it's functional, and I'm guessing that any site using BP at the kind of scale we're talking about here will want to implement a custom noscript solution anyway

4045.03.patch is against trunk r5994. To test, apply the patch, and drop the following into your bp-custom.php:

    add_filter( 'bp_activity_allow_total_count', '__return_false' );

As expected, on large installations, it cuts down page load time by dozens of seconds.

Feedback welcome.

#30 @shanebp
12 years ago

Boone -

Finally got BP 1.5.5 on our production site.

Manually added the 5940 / 5953 patches. And they helped A LOT.
I skipped the 4045 patches.

Is there any reason not to add your 4045.03 patch ?

#31 @boonebgorges
12 years ago

shanebp - That is *great* news.

No reason not to add 4045.03. Make sure you add the 'bp_activity_allow_total_count' filter as well, otherwise you won't see any actual performance change.

Thanks!

#32 @shanebp
12 years ago

boone -

To clarify, we're using the public version of BP 1.5.5 - not the trunk.
So I haven't added the 4045.03 patch.

Manually adding the 5940 / 5953 patches helped - without it, the site becomes 'unavailable' in spurts and very slow otherwise.

Looking at the overnight mysql log:
I see the much the same thing as zacechola re "SELECT a.*, u.user_email,..." -
"...as long as the activity is evenly spread out, we do OK with the patch, but as soon as we see a spike in updates, we're in trouble..."

Specifically, we see bursts of "SELECT a.*," entries in the log, with Query_time: ~10 secs. And lots of other queries stack up behind it.

Example of the problem query:

SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name, pd.value 
as user_fullname FROM wp_bp_activity a, wp_users u, wp_bp_xprofile_data pd  
WHERE u.ID = a.user_id AND pd.user_id = a.user_id AND pd.field_id = 1 
AND a.type = 'activity_comment' AND a.item_id = 577900 AND a.mptt_left 
BETWEEN 1 AND 4 ORDER BY a.date_recorded ASC;

Is it true that your 4045.03 patch addresses
SELECT count(a.id) FROM wp_bp_activity...
queries (of which we get a handful of slowes) as opposed to "SELECT a.*," ?

Last edited 12 years ago by shanebp (previous) (diff)

#33 follow-up: @boonebgorges
12 years ago

shanebp - My patches so far do not address the kinds of queries that you reference here. Your example query is not limited, which means that somewhere in your install, BP_Activity_Activity::get() is being called with $per_page and/or $page being set to 0. I don't know off the top of my head where this happens in BP is in bp_activity_get_specific(), but all of the calls to bp_activity_get_specific() throughout BP also pass through activity_ids, which does not appear to be reflected in your example query. Do you have a way of tracing where that query is originating? What plugins are you running?

#34 in reply to: ↑ 33 @shanebp
12 years ago

boonebgorges:

I believe my example query is generated by
bp-activity-classes -> function get_activity_comments
which is called by
bp-activity-classes -> function append_comments

The example shows that a single activity_id is being passed - it shows up as a.item_id in the query.
I don't see $per_page or $page and wouldn't expect re comments... ?

In bp-activity-classes -> function get
I see

if ( !empty( $per_page ) && !empty( $page ) ) {... } 
else {
$activities = $wpdb->get_results( apply_filters( 'bp_activity_get_user_join_filter',
 $wpdb->prepare( "{$select_sql} {$from_sql} {$where_sql} ORDER BY a.date_recorded 
{$sort}" ), $select_sql, $from_sql, $where_sql, $sort ) );
}

So there is no $page_sql in the else.

We're running ~50 plugins. I'm prepared to go thru them but thought I'd share this thought first.

Last edited 12 years ago by shanebp (previous) (diff)

#35 @boonebgorges
12 years ago

Good call. That is definitely the problem. (Note that item_id means something different from activity_id.)

The kind of query performed in BP_Activity_Activity::get_activity_comments() is problematic in the same way that COUNT queries are problematic: they very often require table scans. I will do some testing to see whether the index hinting strategy implemented in the main get() method will help here.

#36 @boonebgorges
12 years ago

shanebp - Would you be willing to share an EXPLAIN of the problematic get_activity_comments() query? When I test on my environment (~1.5M rows in wp_bp_activity), MySQL is optimizing pretty well (indexing on item_id), but maybe it's not happening for you (in which case I will play with some index hints).

#37 follow-up: @shanebp
12 years ago

boone-

EXPLAIN SELECT a . * , u.user_email, u.user_nicename, u.user_login, u.display_name, pd.value AS user_fullname
FROM wp_bp_activity a, wp_users u, wp_bp_xprofile_data pd
WHERE u.ID = a.user_id
AND pd.user_id = a.user_id
AND pd.field_id =1
AND a.type =  'activity_comment'
AND a.item_id =577900
AND a.mptt_left
BETWEEN 1 
AND 4 
ORDER BY a.date_recorded ASC ;

 


id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	a	ref	user_id,item_id,type,mptt_left	type	227	const	32148	Using where; Using filesort
1	SIMPLE	u	eq_ref	PRIMARY	PRIMARY	8	PRIME.a.user_id	1	Using where
1	SIMPLE	pd	ref	field_id,user_id	user_id	8	PRIME.a.user_id	12	Using where

#38 in reply to: ↑ 37 @shanebp
12 years ago

boone - I don't have command-line access. And it seems phpmyadmin makes the Explain output hard to copy/paste.

So here's a screenshot, lol: explain_sharebp.gif

Last edited 12 years ago by shanebp (previous) (diff)

#39 @boonebgorges
12 years ago

Pasting the following comment from #4060, which belongs here:

==

I see two queries that could cause concurrency bottlenecks; both of these are showing up in the processlist many times per minute:

Query 1:

SELECT a.*, u.user_email, u.user_nicename, u.user_login, u.display_name

FROM wp_bp_activity a USE INDEX (date_recorded) LEFT JOIN wp_users u

ON a.user_id = u.ID WHERE a.user_id IN ( 28221 ) AND a.component
IN ( 'album' ) AND a.item_id IN ( 47715 ) AND a.type != 'activity_comment'

ORDER BY a.date_recorded DESC LIMIT 0, 20

This query performs a 0.7 second filesort and I see this query run many times. Removing the USE INDEX (date_recorded) portion of this query brought the total run time down to around 1 millisecond. I recommend removing this portion of the query from your codebase.

==

I will need to do some testing here. I assume that what's happening is that bp-album is filtering the query, since if the WHERE clause had looked like that to begin with, date_recorded would not have been selected for the index hint.

shanebp, can you tell me what version of bp-album you're running, so I can check their codebase?

#40 @shanebp
12 years ago

boone -

We're using BP Album - Version 0.1.8.10 and BP 1.5.5

I see (now) that there is a Album version 0.1.8.11 that says it adds compatibility with BP 1.5.

I don't see anything new in the new version re that query - but I could easily miss it.

#41 @boonebgorges
12 years ago

Thanks. I think you are using an out-of-date version of the fix - index hints are no longer being used in the LIMIT query. See https://buddypress.trac.wordpress.org/browser/trunk/bp-activity/bp-activity-classes.php?rev=5993#L218, especially line 225, as well as https://buddypress.trac.wordpress.org/browser/trunk/bp-activity/bp-activity-classes.php?rev=5993#L153

#43 @TheBeardedOne
12 years ago

  • Cc matt_buddypress-trac@… added

#44 @boonebgorges
12 years ago

  • Milestone changed from 1.6 to Future Release

I think we've done all that we can do for this dev cycle.

During the next one, I want to have a look at the possibility of splitting the query. I recently learned that MySQL can't create temp tables in RAM if the temp table includes fulltext. So we might think about doing a SELECT id lookup before getting the content of the activity items themselves. But this will break (some) legacy filters of the SQL statement, so we'll wait for 1.7.

#45 @shanebp
12 years ago

Will MyISAM vs. InnoDB and their difference re fulltext searches be a factor ?

#46 follow-up: @boonebgorges
12 years ago

Yes, it's possible. It'll need more testing.

@TheBeardedOne
12 years ago

Query Optimizations that was done by Percona via consulting contract

#47 @DJPaul
12 years ago

Just adding a comment so people subscribed via email are aware that TheBeardedOne has uploaded a PDF report about our SQL queries. Thanks TheBeardedOne!

#48 in reply to: ↑ 46 @johnjamesjacoby
12 years ago

Replying to boonebgorges:

Yes, it's possible. It'll need more testing.

Awesome stuff. Whiskey is on me.

Version 0, edited 12 years ago by johnjamesjacoby (next)

#49 @DJPaul
11 years ago

  • Keywords needs-patch added; 2nd-opinion has-patch removed

#50 @boonebgorges
10 years ago

  • Milestone Future Release deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Hi everyone - I'm closing this ticket in favor of #5349, which has a patch that should preserve backward compatibility. Testing and feedback welcome.

Note: See TracTickets for help on using tickets.