Skip to:
Content

BuddyPress.org

Opened 8 years ago

Last modified 32 hours ago

#7500 assigned enhancement

Harmful bp_activity indexes

Reported by: brandonliles's profile brandonliles Owned by: espellcaste's profile espellcaste
Milestone: Under Consideration Priority: normal
Severity: normal Version:
Component: Performance Keywords: has-patch
Cc:

Description

The indexes on the bp_activity table hide_sitewide and is_spam are harmful for query performance since the columns being indexed are tinyint(1). If MySQL uses these indexes it will likely result in table scans.

A far more selective index would be:
KEY component_type_user_id_date_recorded (component, type, user_id, date_recorded)

Attachments (3)

24hour-window-very-clear-effect.png (166.3 KB) - added by brandonliles 8 years ago.
Index Adjustment Query Impact Graph 1
7500.01.patch (561 bytes) - added by r-a-y 7 years ago.
7500.02.patch (2.2 KB) - added by r-a-y 7 years ago.

Download all attachments as: .zip

Change History (23)

#1 @hnla
8 years ago

  • Keywords 2nd-opinion added

This ticket was mentioned in Slack in #buddypress by hnla. View the logs.


8 years ago

#3 @johnjamesjacoby
8 years ago

Howdy @brandonliles!

Thanks for opening this ticket and detailing your findings.

It's generally true that indexes on columns with low cardinality is ineffective, however, benchmarks to confirm that full table scans are currently a problem would be useful. That way we can also confirm removing those indexes will improve performance as anticipated.

Once we've done that, an upgrade routine will need to be introduced that comfortable locks those tables while the indexes are adjusted. Since the Activity table is highly likely to have many, many rows, even index removal is not a simple or painless process.

TL;DR - We need:

  • Benchmarks
  • Confirmation
  • Safe & friendly upgrade routine
Version 0, edited 8 years ago by johnjamesjacoby (next)

@brandonliles
8 years ago

Index Adjustment Query Impact Graph 1

#4 @brandonliles
8 years ago

Hi @johnjamesjacoby ,

On our site we have a little over 1.6 million rows in the bp_activity table. Most of the time MySQL was using a decent query plan, but several times an hour we were seeing MySQL use the hide_sitewide and is_spam indexes for the activity loop which was about the same performance as a table scan. On our system that was resulting in a query that averaged 6 seconds. In the graph I've posted here you see the result of our index adjustment. With those indexes removed, we're no longer seeing that bad query plan.

#5 @johnjamesjacoby
8 years ago

Thanks for the quick reply!

Is that just from removing the hide_sitewide and is_spam alone? Or is that also with your custom compound key?

component_type_user_id_date_recorded (component, type, user_id, date_recorded)

#6 @brandonliles
8 years ago

The graphs/data include that compound key that we added after analyzing the output from EXPLAIN.

@r-a-y
7 years ago

#7 @r-a-y
7 years ago

  • Component changed from Core to Performance
  • Milestone changed from Awaiting Review to Under Consideration

@brandonliles - So is 01.patch what you are recommending? Or did you keep the existing indexes on hide_sitewide and is_spam intact?

My SQL-fu is not that great, so I'll leave it to you guys to discuss.

#8 @brandonliles
7 years ago

@r-a-y 01.patch is exactly what I'm recommending. I didn't keep the existing indexes, due to their low selectivity they are more likely to cause the optimizer to use a poor query plan.

@r-a-y
7 years ago

#9 @r-a-y
7 years ago

  • Keywords has-patch added; 2nd-opinion removed
  • Type changed from defect (bug) to enhancement

02.patch adds an upgrade routine to drop the hide_sitewide and is_spam indexes and upgrades the activity DB table to add the new, proposed index.

This is what shows up for SHOW INDEX FROM wp_bp_activity afterwards:

$ wp db query "SHOW INDEX FROM wp_bp_activity"                                                                                                                                                        
+----------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 
| Table          | Non_unique | Key_name                             | Seq_in_index | Column_name       | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | 
+----------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 
| wp_bp_activity |          0 | PRIMARY                              |            1 | id                | A         |         713 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | date_recorded                        |            1 | date_recorded     | A         |         713 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | user_id                              |            1 | user_id           | A         |          35 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | item_id                              |            1 | item_id           | A         |          89 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | secondary_item_id                    |            1 | secondary_item_id | A         |         356 |     NULL | NULL   | YES  | BTREE      |         |               | 
| wp_bp_activity |          1 | component                            |            1 | component         | A         |          13 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | type                                 |            1 | type              | A         |          32 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | mptt_left                            |            1 | mptt_left         | A         |          33 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | mptt_right                           |            1 | mptt_right        | A         |          39 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | component_type_user_id_date_recorded |            1 | component         | A         |          13 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | component_type_user_id_date_recorded |            2 | type              | A         |          39 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | component_type_user_id_date_recorded |            3 | user_id           | A         |         118 |     NULL | NULL   |      | BTREE      |         |               | 
| wp_bp_activity |          1 | component_type_user_id_date_recorded |            4 | date_recorded     | A         |         713 |     NULL | NULL   |      | BTREE      |         |               | 
+----------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+


Last edited 7 years ago by r-a-y (previous) (diff)

#10 @r-a-y
7 years ago

  • Milestone changed from Under Consideration to 3.0

Would like to tackle this in v3.0.

This ticket was mentioned in Slack in #buddypress by djpaul. View the logs.


7 years ago

#12 @DJPaul
7 years ago

  • Milestone changed from 3.0 to Awaiting Contributions

#13 @DJPaul
7 years ago

@r-a-y If you get time to work on this for 3.0, feel free to move it back to 3.0.

#14 @espellcaste
8 months ago

  • Milestone changed from Awaiting Contributions to Up Next
  • Owner set to espellcaste
  • Status changed from new to assigned

I'll revisit this to confirm the improvements are still necessary.

#15 @imath
4 months ago

  • Milestone changed from Up Next to 15.0.0

#16 @espellcaste
3 months ago

  • Milestone changed from 15.0.0 to Awaiting Review

#17 @espellcaste
32 hours ago

  • Milestone changed from Awaiting Review to Under Consideration

I'd like to offer a different take on the suggested approach/patch.

The rules for composite indexes states that there are two main rules for using composite indexes:

  • Left-to-right, no skipping: MySQL can only access the index in order, starting from the leftmost column and moving to the right. It can't skip columns in the index.
  • Stops at the first range: MySQL stops using the index after the first range condition encountered.

I think that if we create this composite index as is, we might create other issues for different types of queries where not all keys will be available and a table scan will happen too.

component_type_user_id_date_recorded (component, type, user_id, date_recorded)

Let's look at a practical example. A default query from https://site.test/activity/

SELECT DISTINCT
	a.id
FROM
	wp_bp_activity a
WHERE
	a.is_spam = 0
	AND a.hide_sitewide = 0
	AND a.type NOT IN('activity_comment', 'last_activity')
ORDER BY
	a.date_recorded DESC,
	a.id DESC
LIMIT 0,
21;

I see no index being used here.

{
    "id": 1,
    "select_type": "SIMPLE",
    "table": "a",
    "partitions": null,
    "type": "ALL",
    "possible_keys": "type,is_spam,hide_sitewide",
    "key": null,
    "key_len": null,
    "ref": null,
    "rows": 115,
    "filtered": 97.39,
    "Extra": "Using where; Using filesort"
}

Now, after applying the patch, the date_recorded index was used:

{
    "id": 1,
    "select_type": "SIMPLE",
    "table": "a",
    "partitions": null,
    "type": "index",
    "possible_keys": "type,hide_sitewide",
    "key": "date_recorded",
    "key_len": "5",
    "ref": null,
    "rows": 21,
    "filtered": 9.74,
    "Extra": "Using where; Backward index scan"
}

I also removed only the is_spam index without applying the patch, and got the same results. The composite index was not chosen by mysql.

@brandonliles I know it's been ages since you shared your feedback. But if we could see which index is being chosen by mysql in your queries, that would help confirm my findings.

---

Outcome: it seems that only removing the is_spam index is enough to mysql to use another index with good cardinality, rather than a full table scan.

It's not clear, based on my testing, the patch here will actually avoid a table scan. On the contrary, it is technically possible it'll force a full table scan in certain situations.

Last edited 32 hours ago by espellcaste (previous) (diff)

This ticket was mentioned in Slack in #buddypress by espellcaste. View the logs.


32 hours ago

#19 @brandonliles
32 hours ago

I'm afraid it's been ages since I worked on the project that used BuddyPress so I can't really provide any useful feedback at this point.

#20 @espellcaste
32 hours ago

@brandonliles Understood. And no worries. Appreciate it.

Note: See TracTickets for help on using tickets.