Page MenuHomePhabricator

MySQL chooses poor query plan for link counting query
Closed, ResolvedPublic

Description

CirrusSearch is using a new query for counting incoming links (previously this was queried from ElasticSearch), it looks roughly like:

SELECT Count(1) 
FROM   `pagelinks` 
WHERE  ( pl_namespace = '0' AND pl_title = 'Meter') 
	OR ( pl_namespace = '0' AND pl_title = 'Centimeter') 
	OR ( pl_namespace = '0' AND pl_title = 'Kilometer') 
	OR ( pl_namespace = '0' AND pl_title = 'Zentimeter') 
	OR ( pl_namespace = '0' AND pl_title = 'Nanometer') 
	OR ( pl_namespace = '0' AND pl_title = 'Millimeter') 
	OR ( pl_namespace = '0' AND pl_title = 'Picometer') 
	OR ( pl_namespace = '0' AND pl_title = 'Femtometer') 
	OR ( pl_namespace = '0' AND pl_title = 'Dezimeter') 
	OR ( pl_namespace = '0' AND pl_title = 'Hektometer') 
	OR ( pl_namespace = '0' AND pl_title = 'Dekameter') 
	OR ( pl_namespace = '0' AND pl_title = 'Μm') 
	OR ( pl_namespace = '0' AND pl_title = 'Myriam eter') 
	OR ( pl_namespace = '0' AND pl_title = 'Pikometer') 
	OR ( pl_namespace = '0' AND pl_title = 'Neuzoll') 
	OR ( pl_namespace = '0' AND pl_title = 'Mikrometer_(Einheit)') 
LIMIT  1;

the available indexes look like:

UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`),
KEY `pl_namespace` (`pl_namespace`,`pl_title`,`pl_from`),
KEY `pl_backlinks_namespace` (`pl_namespace`,`pl_title`,`pl_from_namespace`,`pl_from`)

This generates the query plan:

+------+-------------+-----------+------+-------------------------------------+------------------------+---------+-------+-------+--------------------------+
| id   | select_type | table     | type | possible_keys                       | key                    | key_len | ref   | rows  | Extra                    |
+------+-------------+-----------+------+-------------------------------------+------------------------+---------+-------+-------+--------------------------+
|    1 | SIMPLE      | pagelinks | ref  | pl_namespace,pl_backlinks_namespace | pl_backlinks_namespace | 4       | const | 20025 | Using where; Using index |
+------+-------------+-----------+------+-------------------------------------+------------------------+---------+-------+-------+--------------------------+

We can see from the key_len and the ref column this is only using the first part of the pl_backlinks_namespace index, the namespace, and then iterating to find the titles. Due to the cardinality here this is a very poor query plan and times out.

Limiting to a single title mysql chooses a better query plan:

mysql:wikiadmin@db1082 [dewiki]> explain SELECT  count(1)  FROM `pagelinks`   WHERE (pl_namespace = '0' AND pl_title = 'Meter')  LIMIT 1;                                            
+------+-------------+-----------+------+-------------------------------------+--------------+---------+-------------+-------+--------------------------+
| id   | select_type | table     | type | possible_keys                       | key          | key_len | ref         | rows  | Extra                    |
+------+-------------+-----------+------+-------------------------------------+--------------+---------+-------------+-------+--------------------------+
|    1 | SIMPLE      | pagelinks | ref  | pl_namespace,pl_backlinks_namespace | pl_namespace | 261     | const,const | 10746 | Using where; Using index |
+------+-------------+-----------+------+-------------------------------------+--------------+---------+-------------+-------+--------------------------+

Alternatively adding a USE INDEX (pl_namespace) also generates a better query plan

mysql:wikiadmin@db1082 [dewiki]> explain SELECT  count(1)  FROM `pagelinks` USE INDEX (`pl_namespace`)  WHERE (pl_namespace = '0' AND pl_title = 'Meter') OR (pl_namespace = '0' AND pl_title = 'Centimeter') OR (pl_namespace = '0' AND pl_title = 'Kilometer') OR (pl_namespace = '0' AND pl_title = 'Zentimeter') OR (pl_namespace = '0' AND pl_title = 'Nanometer') OR (pl_namespace = '0' AND pl_title = 'Millimeter') OR (pl_namespace = '0' AND pl_title = 'Picometer') OR (pl_namespace = '0' AND pl_title = 'Femtometer') OR (pl_namespace = '0' AND pl_title = 'Dezimeter') OR (pl_namespace = '0' AND pl_title = 'Hektometer') OR (pl_namespace = '0' AND pl_title = 'Dekameter') OR (pl_namespace = '0' AND pl_title = 'Μm') OR (pl_namespace = '0' AND pl_title = 'Myriameter') OR (pl_namespace = '0' AND pl_title = 'Pikometer') OR (pl_namespace = '0' AND pl_title = 'Neuzoll') OR (pl_namespace = '0' AND pl_title = 'Mikrometer_(Einheit)')  LIMIT 1;                                                                                                                                                      
+------+-------------+-----------+-------+---------------+--------------+---------+------+-------+--------------------------+
| id   | select_type | table     | type  | possible_keys | key          | key_len | ref  | rows  | Extra                    |
+------+-------------+-----------+-------+---------------+--------------+---------+------+-------+--------------------------+
|    1 | SIMPLE      | pagelinks | range | pl_namespace  | pl_namespace | 261     | NULL | 13902 | Using where; Using index |
+------+-------------+-----------+-------+---------------+--------------+---------+------+-------+--------------------------+

This query runs in ~10ms when using an appropriate query plan:

+----------+
| count(1) |
+----------+
|     8536 |
+----------+
1 row in set (0.01 sec)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@jcrespo Is there anything we can do to help mysql generate better query plans here? I can add the relevant USE INDEX (pl_namespace) but there may be a better solution. It seems mysql is making a pretty poor guess on how many rows it will have to visit when scanning the index at pl_namespace = 0

Change 306738 had a related patch set uploaded (by EBernhardson):
Suggest database to use pl_namespace index for link counting

https://gerrit.wikimedia.org/r/306738

Change 306738 merged by jenkins-bot:
Suggest database to use pl_namespace index for link counting

https://gerrit.wikimedia.org/r/306738

Change 306803 had a related patch set uploaded (by EBernhardson):
Suggest database to use pl_namespace index for link counting

https://gerrit.wikimedia.org/r/306803

Change 306803 merged by jenkins-bot:
Suggest database to use pl_namespace index for link counting

https://gerrit.wikimedia.org/r/306803

First of all, I would avoid any change until I finish the deployment of: T139090, then we can see what we can do about it at server side.

Context of deployment from the IRC log http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20160825.txt

[23:05:54] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0]
[23:09:44] <MaxSem> we kinda have a ton of DB connection errors ^ but it seems to be calming down
[23:10:23] <MaxSem> did someone accidentally a mysqld? :P
[23:12:39] <ebernhardson> no, those connection errors are a bad query plan mysql generated,
                          it thinks scanning a few tens of millions of rows is a better idea
                          than 30 key lookups in an index
[23:12:47] <ebernhardson> but the other patch in swat kinda/mostly fixes that
[23:13:54] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0]
[23:14:28] <ebernhardson> a better fix will have to wait for jcrespo ... my best guess is
                          something is wrong with index stats because those indexes think the integer
                          namespace field has a cardinality of ~3 million, which seems wrong...
[23:14:51] <bd808>	 that's a lot of namespaces
[23:14:56] <ebernhardson>	 yea :)
[23:15:12] <bd808>	 I might believe ~300
[23:15:46] <logmsgbot>	 !log maxsem@tin Synchronized php-1.28.0-wmf.16/extensions/CirrusSearch: (no message) (duration: 00m 57s)
[23:16:44] <ebernhardson>	 indeed :)
[23:18:02] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0]
[23:24:03] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0]

I have kept the icinga-wm messages, pretty sure the sync fixed the MediaWiki exceptions and fatals per minute alarm.

<jynus> also dcausse I am very willing to give reviews about anything regarding the db
<jynus> I am never too busy for that
<jynus> I have nothing against the idea, I just think I can help making it better and faster :-)
[first revert the orginal patch]
<jynus> the immediate changes would be to "$dbr = wfGetDB( DB_SLAVE, 'vslow' );" and once we finish the schema change, fix the query plan by regenerating table stats
<jynus> that is way better than adding a USE INDEX
<jynus> for me, I probably should add a watchdog to the new servers and improve my monitoring
<jynus> If we still need an index hint, we must document it
<jynus> becase that is a fix now and a breakage tomorrow (in this case, literally, as the schema change was ongoing as we speak)

Change 306883 had a related patch set uploaded (by DCausse):
Revert "Simplify incoming_links counting from es query to mysql"

https://gerrit.wikimedia.org/r/306883

So yes, this generated 1-hour queries:

Hits 	Tmax 	Tavg 	Tsum 	Hosts 	Users 	Schemas 
4	3,056	3,023	12,094	db1080	wikiuser	enwiki
SELECT /* CirrusSearch\BuildDocument\RedirectsAndIncomingLinks::countIncomingLinks 127.0.0.1 */ count(1) FROM `pagelinks` WHERE ... db1080 enwiki 2946s */ /* localhost */

Full query, NDA-restricted at P3895

The issue is that using perfect indexes when there are 1000 items on the IN/OR is difficult (in the latest MySQL versions, this is is configurable).

Change 306896 had a related patch set uploaded (by Jcrespo):
Temporarilly redirect RedirectsAndIncomingLinks job to a single db

https://gerrit.wikimedia.org/r/306896

Change 306896 merged by jenkins-bot:
Temporarilly redirect RedirectsAndIncomingLinks job to a single db

https://gerrit.wikimedia.org/r/306896

Change 306899 had a related patch set uploaded (by DCausse):
Temporarilly redirect RedirectsAndIncomingLinks job to a single db

https://gerrit.wikimedia.org/r/306899

Change 306899 merged by jenkins-bot:
Temporarilly redirect RedirectsAndIncomingLinks job to a single db

https://gerrit.wikimedia.org/r/306899

Mentioned in SAL [2016-08-26T10:34:54Z] <hashar> pulled https://gerrit.wikimedia.org/r/#/c/306899/ "Temporarilly redirect RedirectsAndIncomingLinks job to a single db" on tin for T143932

Mentioned in SAL [2016-08-26T10:50:01Z] <hashar> running scap pull on mw1163 mw1164 mw1165 mw1166 mw1167 mw1168 (job runners) for T143932

Mentioned in SAL [2016-08-26T11:23:47Z] <hashar@tin> Synchronized php-1.28.0-wmf.16/extensions/CirrusSearch/includes/BuildDocument/RedirectsAndIncomingLinks.php: Temporarilly redirect RedirectsAndIncomingLinks job to a single db T143932 (duration: 00m 49s)

https://gerrit.wikimedia.org/r/306899 has been pushed to the cluster. The job RedirectsAndIncomingLinks is thus running on the vslow shard of database(s?).

The issue is that using perfect indexes when there are 1000 items on the IN/OR is difficult (in the latest MySQL versions, this is is configurable).

Is there a reasonable limit we could apply to the size of the OR conditions? We could certainly split the query and issue it a few times with different conditions, summing in the application layer, if that will work around the problem. I agree that a USE INDEX is far from ideal, and doesn't have any guarantee's of staying fixed.

Yes, that was more or less my idea to move this forward, simplify a bit the query, and document the issue on code. While it is not the most optimal solution, having several smaller queries may be better than a single small one. When we upgrade to 5.7 or equivalent, we will be able to use the original one.

For now, I have to block this until the schema change on this very same indes is applied, and next week I will work on this when we have a stable structure to work with (something we do not have now :-().

It was a good decision to apply the "vslow"- I already saw 1 minute queries; which means the USE INDEX is not 100% effective, as I thought; the difference is that it is not bringing down production.

debt triaged this task as Medium priority.Sep 12 2016, 3:40 PM

It should only take a few days more/a week at maximum to finish the schema change and start working on this. Please ping me if I forget about this, this month I have a lot of things on my mind, but this is pending.

The plan problems is a known issue of IN + a large list of items http://mysqlserverteam.com/you-asked-for-it-new-default-for-eq_range_index_dive_limit/. There are 3 possibilities:

  • Force the index- good for now, but it creates technical debt that has to be payed in the future
  • Reduce the max amount of items compared to less than 10
  • Upgrade the fleet to a version that can allow to set a larger eq_range_index_dive_limit (it will eventually done, but it will take months)

Change 315530 had a related patch set uploaded (by EBernhardson):
Issue less than 10 OR's in a single query

https://gerrit.wikimedia.org/r/315530

Change 306883 abandoned by EBernhardson:
Revert "Simplify incoming_links counting from es query to mysql"

Reason:
decided this is unnecessary, the query was temporarily sent to vslow, and is being adjusted to generate a better query plan.

https://gerrit.wikimedia.org/r/306883

Change 315530 merged by jenkins-bot:
Issue less than 10 OR's in a single query

https://gerrit.wikimedia.org/r/315530

@EBernhardson Did you check if this worked? If yes, we should move it back from vslow to general traffic. Blocking alter table on pagelinks was already deployed (T139090).

CirrusSearch\BuildDocument\RedirectsAndIncomingLinks::countIncomingLinks is so slow, that it is getting killed and potential DOSes our servers now.

{
  "_index": "logstash-2016.10.28",
  "_type": "mediawiki",
  "_id": "AVgKd-B9cJvqzGkjS3DJ",
  "_score": null,
  "_source": {
    "message": "CirrusSearch\\BuildDocument\\RedirectsAndIncomingLinks::countIncomingLinks\t10.64.16.34\t2013\tLost connection to MySQL server during query (10.64.16.34)\tSELECT  count(1)  FROM `pagelinks`    WHERE (pl_namespace = '0' AND pl_title = 'Q8567') OR (pl_namespace = '0' AND pl_title = 'Q19695641')  LIMIT 1  ",
    "@version": 1,
    "@timestamp": "2016-10-28T08:46:17.000Z",
    "type": "mediawiki",
    "host": "mw1301",
    "level": "ERROR",
    "tags": [
      "syslog",
      "es",
      "es"
    ],
    "channel": "DBQuery",
    "normalized_message": "{fname}\t{db_server}\t{errno}\t{error}\t{sql1line}",
    "url": "/rpc/RunJobs.php?wiki=wikidatawiki&type=cirrusSearchIncomingLinkCount&maxtime=60&maxmem=300M",
    "ip": "127.0.0.1",
    "http_method": "POST",
    "server": "127.0.0.1",
    "referrer": null,
    "wiki": "wikidatawiki",
    "mwversion": "1.28.0-wmf.23",
    "reqId": "94e408c0a5cd982dbb1fd98c",
    "db_server": "10.64.16.34",
    "db_name": "wikidatawiki",
    "db_user": "wikiuser",
    "method": "Database::reportQueryError",
    "errno": 2013,
    "error": "Lost connection to MySQL server during query (10.64.16.34)",
    "sql1line": "SELECT  count(1)  FROM `pagelinks`    WHERE (pl_namespace = '0' AND pl_title = 'Q8567') OR (pl_namespace = '0' AND pl_title = 'Q19695641')  LIMIT 1  ",
    "fname": "CirrusSearch\\BuildDocument\\RedirectsAndIncomingLinks::countIncomingLinks"
  },
  "fields": {
    "@timestamp": [
      1477644377000
    ]
  },
  "sort": [
    1477644377000
  ]
}
16	308	76	1,231	db1045, db1064	wikiuser	commonswiki, wikidatawiki
SELECT /* CirrusSearch\BuildDocument\RedirectsAndIncomingLinks::countIncomingLinks */ count(1) FROM `pagelinks` WHERE (pl_namespace = '0' AND pl_title = 'Q585') OR (pl_namespace = '0' AND pl_title = 'Q13156002') LIMIT 1 /* 33a70a16433c536af18417f809487509 db1045 wikidatawiki 61s */

42	7	2	97	db1028, db1030, db1064	wikiuser	commonswiki, frwiki, metawiki
SELECT /* CirrusSearch\BuildDocument\RedirectsAndIncomingLinks::countIncomingLinks */ count(1) FROM `pagelinks` WHERE (pl_namespace = '4' AND pl_title = 'Licensing') OR (pl_namespace = '0' AND pl_title = 'Licensing') OR (pl_namespace = '0' AND pl_title = 'Public_domain') OR (pl_namespace = '4' AND pl_title = 'Copyright') OR (pl_namespace = '4' AND pl_title = 'L') OR (pl_namespace = '4' AND pl_title = 'NoFU') OR (pl_namespace = '4' AND pl_title = 'Licencing') OR (pl_namespace = '4' AND pl_title = 'Image_use_policy') OR (pl_namespace = '4' AND pl_title = 'LICENSING') LIMIT 1 /* 9ee5138990a1d6c673ecd02c447e2104 db1064 commonswiki 2s */

This is the 3rd incident with slow queries & cirrus in the last few weeks- I am going to ask you to block all further deployments indefinitely that have no +1 from a DBA.

jcrespo raised the priority of this task from Medium to High.Oct 28 2016, 9:00 AM

Change 306883 restored by EBernhardson:
Revert "Simplify incoming_links counting from es query to mysql"

Reason:
looks like this will be necessary, mysql just refuses to generate reasonable query plans.

https://gerrit.wikimedia.org/r/306883

I'll re-put together the stuff that queries this out of elasticsearch instead of mysql. It's completely un-indexed in elasticsearch outside of the basic postings lists, meaning the best case scenario is much worse, but it seems the worst case scenario is also much better and avoids the pathological query planner in mariadb.

I do not think this is unfixable, I just need the time, which I probably will have soon.

To get an idea of what the latencies in elasticsearch look like i pulled a histogram from our historic query data for a day. This is a histogram of the number of ms using 10 non-uniformly spaced bins.

msnum queries
13.73321122
709.55178
1733.7266
3279.138
4501.228
5799.127
6849.234
7824.328
8719.121
9502.218

This basically works out to 99.83% of queries taking < 14ms, and the remaining taking up to 10s

Change 306883 merged by jenkins-bot:
Revert "Simplify incoming_links counting from es query to mysql"

https://gerrit.wikimedia.org/r/306883

I've analyzed the problem, and the issue is not the query plans themselves, they use the pl_namespace consistently, at least on my tests.

The issue is that counting rows can make the query run over several dozens of million of rows (as, effectively, there are images used on million of pages such as icons).

The way to solve this problem is asking what is the original intention of this query- and if exact counts are needed. If not, we can do a different query where we stop counting if we get more than X rows. If we need exact counting, maybe we can architecturally find a different solution by pre-counting or something else.

EBernhardson please communicate to me to see what is the best follow-up.

At a minimum, the following changes should be done:

  • All queries should use the vslow databases so they do not interfere with webrequests
  • Only one page or image should be queried at each time
  • Only one query should be active per shard, not several queries should be sent to the same shard at the same time (s3 contains almost 800 wikis/dbs)

As a note, the latest index changes on pagelinks should have improved the query planning in the first place.

Thanks!

We reverted the code that uses the db for this process and switched back to elastic to compute this value.
The goal was to reduce code complexity in cirrus. If this particular query causes too much problem I'm ok to keep the code slightly more complex and keep the load on elastic nodes.

At a minimum, the following changes should be done:

  • All queries should use the vslow databases so they do not interfere with webrequests

I think it's something you've done right after the first issue.

  • Only one page or image should be queried at each time

We have at most 1024 pages, the last patch from Erik broke this up to 10 pages per query

  • Only one query should be active per shard, not several queries should be sent to the same shard at the same time (s3 contains almost 800 wikis/dbs)

I'm not sure to know how to do this, this query is run from jobrunners.

Stepping back on this issue and slightly related to your suggestion to pre-compute this value:
Maybe we can't afford computing this value in near real-time. This value being used for ranking purposes we can probably investigate approaches where we could compute this value weekly from a cluster more suited for batch processing (hadoop). We could reuse our popularity score scripts to populate this value as well?
A weekly refresh sounds ok to me.

Change 320980 had a related patch set uploaded (by DCausse):
[WIP] test job jenkins with mw-core

https://gerrit.wikimedia.org/r/320980

Sorry ignore the above gerritbot comment, wrong bug id.

Avg time for this cirrus job decreased from an unstable ~250ms to ~100ms by using elastic: https://graphite.wikimedia.org/S/By

@EBernhardson what do you think: should we keep investigating on this issue or simply give up and continue to use elastic?