Page MenuHomePhabricator

EntityUsageTable::getUsedEntityIdStrings query on wbc_entity_usage table is sometimes fast, sometimes slow
Closed, DeclinedPublic

Description

1db1060.eqiad.wmnet
2
3mysql> EXPLAIN SELECT DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q1005059','Q10865115','Q10885463','Q10924077','Q10930884','Q10938107','Q11181347','Q14582744','Q14583061','Q14583132','Q14583192','Q148','Q391942','Q42200')\G
4*************************** 1. row ***************************
5 id: 1
6 select_type: SIMPLE
7 table: wbc_entity_usage
8 type: range
9possible_keys: eu_entity_id
10 key: eu_entity_id
11 key_len: 257
12 ref: NULL
13 rows: 2567685
14 Extra: Using where; Using index for group-by
151 row in set (0.00 sec)
16
17db1054.eqiad.wmnet
18
19mysql> EXPLAIN SELECT DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN ('Q1005059','Q10865115','Q10885463','Q10924077','Q10930884','Q10938107','Q11181347','Q14582744','Q14583061','Q14583132','Q14583192','Q148','Q391942','Q42200')\G
20*************************** 1. row ***************************
21 id: 1
22 select_type: SIMPLE
23 table: wbc_entity_usage
24 type: range
25possible_keys: eu_entity_id
26 key: eu_entity_id
27 key_len: 257
28 ref: NULL
29 rows: 17973791
30 Extra: Using where; Using index
311 row in set (0.00 sec)

Event Timeline

jcrespo created this task.Oct 23 2015, 5:49 PM
jcrespo claimed this task.
jcrespo raised the priority of this task from to Normal.
jcrespo updated the task description. (Show Details)
jcrespo added subscribers: jcrespo, hoo.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 23 2015, 5:49 PM
hoo set Security to None.
hoo moved this task from incoming to monitoring on the Wikidata board.
hoo added subscribers: daniel, aude.
jcrespo moved this task from Triage to Backlog on the DBA board.Oct 23 2015, 5:54 PM
jcrespo moved this task from Backlog to In progress on the DBA board.Oct 29 2015, 12:20 PM

I do not see this happening on enwiki. Checking on other wikis/hosts.

jcrespo moved this task from In progress to Backlog on the DBA board.Nov 5 2015, 5:16 PM

A similar thing is happening on zhwiki for a different query- the optimizer seems to have some bug for that wiki in particular?

jcrespo raised the priority of this task from Normal to High.Nov 7 2015, 10:48 AM
jcrespo moved this task from Backlog to In progress on the DBA board.

The initial issue still happens, although now the query is consistently slow every time on both servers.

hoo added a comment.Nov 7 2015, 12:31 PM

I can no longer see this issue on either db1060 nor db1054, but it's still reproducible on eg. db1018.

jcrespo added a comment.EditedNov 7 2015, 6:03 PM

Can confirm last seen on db1018:

SELECT DISTINCT eu_entity_id FROM `wbc_entity_usage` WHERE eu_entity_id IN 
('Q10864210','Q10866766','Q10874855','Q10877844','Q10877846','Q10878314','Q10879635',
'Q10880445','Q10882043','Q10887655','Q10890010','Q10890075','Q10896051','Q10911990',
'Q10912402','Q10912474','Q10916748','Q10920782','Q10925493','Q10927910','Q10931742',
'Q10933511','Q10934945','Q10937305','Q10938551','Q11063426','Q11066570','Q11072572',
'Q11079946','Q11082798','Q11086067','Q11099893','Q11101554','Q11105025','Q11109539',
'Q11110088','Q11122401','Q11124417','Q11131036','Q11181300','Q1152679','Q13908633',
'Q13909197','Q13910038','Q13910640','Q13910996','Q13911366','Q13912288','Q13912391',
'Q13912743','Q13912856','Q13912956','Q13913067','Q13913354','Q13913410','Q13913531',
'Q13913585','Q148','Q19770','Q426996','Q7418222')

This could be related to the index dive limit.

This is some reference material: https://www.facebook.com/notes/mysql-at-facebook/eq_range_index_dive_limit-system-variable-in-mysql-56/10151533648715933

Sadly, this feature is not in mariadb 10.

jcrespo moved this task from In progress to Backlog on the DBA board.Dec 1 2015, 5:23 PM

Marius, or someone else, do you know if this is still ongoing after latest deployments + hardware upgrades?

hoo added a comment.Apr 22 2016, 4:13 PM

I haven't seen it in the error logs recently at least, but I'm not looking at them every day.

jcrespo removed jcrespo as the assignee of this task.Apr 22 2016, 4:20 PM
jcrespo lowered the priority of this task from High to Low.

I promise I will give it a thorough check before trying to resolve it, but let me lower the priority if it is not creating a noticeable impact. I predict that with your code changes + new servers deployed & those pending to be deployed, this may have already solved.

hoo raised the priority of this task from Low to High.EditedJun 9 2016, 8:02 PM

We just hit this hard when we changed the query traffic patterns towards the troublesome query here.

The following slaves had a huge CPU spike, thus I assume they were affected:
db1059
db1056
db1064
db1068

This list is probably incomplete because we didn't even deploy the change to Wikipedias (but all other wikis).

hoo added a comment.Jun 9 2016, 8:20 PM

db1070 vs. db1068 (different database, cold queries, the fact that the result rows match is just a coincidence):

mysql:wikiadmin@db1068 [commonswiki]> SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings  */ DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602');                                   
…                                                                                                                                                                                  
17 rows in set (5.47 sec)                                                                                                                                                                                 


mysql:wikiadmin@db1070 [wikidatawiki]> SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings  */ DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602');                                  
…
17 rows in set (0.06 sec)
hoo added a subscriber: aaron.Jun 9 2016, 8:34 PM

This is not a problem with the servers, the query planner, or the indexing:

MariaDB  db1068 commonswiki > EXPLAIN SELECT  DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wbc_entity_usage
         type: index
possible_keys: eu_entity_id
          key: eu_entity_id
      key_len: 300
          ref: NULL
         rows: 17882560
        Extra: Using where; Using index
1 row in set (0.03 sec)

MariaDB  db1068 commonswiki > SHOW CREATE TABLE wbc_entity_usage\G
*************************** 1. row ***************************
       Table: wbc_entity_usage
Create Table: CREATE TABLE `wbc_entity_usage` (
  `eu_row_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `eu_entity_id` varbinary(255) NOT NULL,
  `eu_aspect` varbinary(37) NOT NULL,
  `eu_page_id` int(11) NOT NULL,
  `eu_touched` binary(14) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`eu_row_id`),
  UNIQUE KEY `eu_entity_id` (`eu_entity_id`,`eu_aspect`,`eu_page_id`),
  KEY `eu_page_id` (`eu_page_id`,`eu_entity_id`)
) ENGINE=InnoDB AUTO_INCREMENT=18752832 DEFAULT CHARSET=binary
1 row in set (0.01 sec)

MariaDB  db1070 wikidatawiki > EXPLAIN SELECT /* Wikibase\Client\Usage\Sql\EntityUsageTable::getUsedEntityIdStrings  */ DISTINCT eu_entity_id  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wbc_entity_usage
         type: range
possible_keys: eu_entity_id
          key: eu_entity_id
      key_len: 257
          ref: NULL
         rows: 1323
        Extra: Using where; Using index
1 row in set (0.05 sec)

MariaDB  db1070 wikidatawiki > SHOW CREATE TABLE wbc_entity_usage\G
*************************** 1. row ***************************
       Table: wbc_entity_usage
Create Table: CREATE TABLE `wbc_entity_usage` (
  `eu_row_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `eu_entity_id` varbinary(255) NOT NULL,
  `eu_aspect` varbinary(37) NOT NULL,
  `eu_page_id` int(11) NOT NULL,
  `eu_touched` binary(14) NOT NULL DEFAULT '\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (`eu_row_id`),
  UNIQUE KEY `eu_entity_id` (`eu_entity_id`,`eu_aspect`,`eu_page_id`),
  KEY `eu_page_id` (`eu_page_id`,`eu_entity_id`)
) ENGINE=InnoDB AUTO_INCREMENT=15125849 DEFAULT CHARSET=binary
1 row in set (0.00 sec)

You are selecting in the first case 14 million rows:

MariaDB  db1068 commonswiki > SELECT count(*)  FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602')\G
*************************** 1. row ***************************
count(*): 13997900
1 row in set (1 min 30.30 sec)

and 1328 in the second case:

MariaDB  db1070 wikidatawiki > SELECT count(*) FROM `wbc_entity_usage`   WHERE eu_entity_id IN ('Q148475','Q54919','Q423048','Q2494649','Q13219454','Q131454','Q36578','Q1798125','Q2597810','Q19938912','Q193563','Q2553334','Q477675','Q623578','Q19675','Q13481868','Q384602');
+----------+
| count(*) |
+----------+
|     1328 |
+----------+
1 row in set (0.00 sec)

I will start killing this query when it takes more than 5 seconds until it is fixed.

daniel added a comment.EditedJun 10 2016, 9:55 AM

@jcrespo @hoo: ick, 14 million rows? And this isn't optimized away because of the DISTINCT?

The intent of the query is: "for the given set of entity IDs, tell me which ones are in the table (at least once)". So the total result will never be larger than the number of IDs in the query.

Instead of looking up batches, we could check each ID individually:

SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = @id LIMIT 1;

The crucial bit here is the LIMIT. Doing this for each ID would perform a lot better than the "bad" case I think. But how does performance compare for the typical case? Would it help to make this a UNION of SELECTs, one for each ID, to avoid round trips?

Also: Would it help to add a limit to the batched query? Would it be planned/executed differently? I mean like this:

SELECT DISTINCT eu_entity_id  FROM `wbc_entity_usage` 
WHERE eu_entity_id IN ('Q148475', 'Q54919', 'Q423048', 'Q2494649', 'Q13219454', 'Q131454', 'Q36578', 'Q1798125', 'Q2597810', 'Q19938912', 'Q193563', 'Q2553334', 'Q477675', 'Q623578', 'Q19675', 'Q13481868', 'Q384602')
LIMIT 17;

With 17 being the number if IDs in the set. Does this help?

@daniel I think this is a case of prematurely optimizing. It is true that things like:

foreach ... { 'SELECT' }

Are usually considered bad practices, but for trying to minimize "round-trip time", we are actually doing a way worse query. Your query cannot know that with only 1 result per IN value it would be enough- it is implemented using a group by.

Look at this quick script I wrote:

time while read id; do mysql -h db1068 commonswiki -e "SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_entity_id = '$id' LIMIT 1"; done < eu_entity_id

eu_entity_id
Q148475
eu_entity_id
Q54919
eu_entity_id
Q423048
eu_entity_id
Q2494649
eu_entity_id
Q13219454
eu_entity_id
Q131454
eu_entity_id
Q36578
eu_entity_id
Q1798125
eu_entity_id
Q2597810
eu_entity_id
Q19938912
eu_entity_id
Q193563
eu_entity_id
Q2553334
eu_entity_id
Q477675
eu_entity_id
Q623578
eu_entity_id
Q19675
eu_entity_id
Q13481868
eu_entity_id
Q384602

real	0m0.119s
user	0m0.028s
sys	0m0.048s

Despite creating a *new connection* each time, it only takes 0.1 seconds (and I am connecting and disconnecting 15 times! It will take even less if we are not connecting every time, as a normal script would do).

Let's go with the simpler approach (point SELECT queries), and we can later try to optimize the round-trip time if needed. InnoDB is very, very fast to do index lookups thanks to internal hash tables.

Of course, that was not a trivial example, but this happens many times- we assume only a few records per value exist, when in reality, uses explodes to several millions quickly.

@jcrespo so we should do one query per ID, with limit 1? ok!