Page MenuHomePhabricator

SpecialMostRevisions::reallyDoQuery takes lots of hours to run on wikidata
Closed, ResolvedPublic

Description

The following query runs on the vslow host of wikidatawiki (which is good):

root@db1087.eqiad.wmnet[wikidatawiki]> explain SELECT /* SpecialMostRevisions::reallyDoQuery www-data@mwmain... */ page_namespace AS `namespace`,page_title AS `title`,COUNT(*) AS `value` FROM `revision`,`page` WHERE page_namespace IN ('640','146','0') AND (page_id = rev_page) AND (page_is_redirect = 0) GROUP BY page_namespace,page_title ORDER BY value DESC LIMIT 5000;
+------+-------------+----------+------+------------------------------------------------+-----------------------------+---------+---------------------------+----------+---------------------------------------------------------------------+
| id   | select_type | table    | type | possible_keys                                  | key                         | key_len | ref                       | rows     | Extra                                                               |
+------+-------------+----------+------+------------------------------------------------+-----------------------------+---------+---------------------------+----------+---------------------------------------------------------------------+
|    1 | SIMPLE      | page     | ref  | PRIMARY,name_title,page_redirect_namespace_len | page_redirect_namespace_len | 1       | const                     | 35629757 | Using index condition; Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | revision | ref  | page_timestamp,page_user_timestamp,rev_page_id | rev_page_id                 | 4       | wikidatawiki.page.page_id |        8 | Using index                                                         |
+------+-------------+----------+------+------------------------------------------------+-----------------------------+---------+---------------------------+----------+---------------------------------------------------------------------+
2 rows in set (0.00 sec)

It has been running for 8h. It causes some spikes of lag: https://grafana.wikimedia.org/render/d-solo/000000273/mysql?panelId=6&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1087&var-port=9104&from=1574654386713&to=1574675986713&width=1000&height=500&tz=Europe%2FMadrid

mysql (4).png (500×1 px, 28 KB)

Which causes MW to complain (https://logstash.wikimedia.org/goto/64894c5953d862563733a4708e8f3da1):

 @timestamp	   	2019-11-25T10:00:36
t @version	   	1
t _id	   	AW6iAX06KWrIH1QR588P
t _index	   	logstash-mediawiki-2019.11.25
# _score	   	 - 
t _type	   	mediawiki
t channel	   	DBQuery
t db_name	   	 - 
t db_server	   	10.64.32.113
t db_user	   	wikiuser
t facility	   	user
t host	   	mw1338
t http_method	   	POST
t ip	   	10.64.32.50
t level	   	ERROR
t logsource	   	mw1338
t message	   	Timed out waiting for replication to reach 171970645-171970645-288070551,171978778-171978778-3298185533,171978924-171978924-759099546,180359242-180359242-170963125
t method	   	Wikimedia\Rdbms\DatabaseMysqlBase::masterPosWait
t mwversion	   	1.35.0-wmf.5
t normalized_message	   	Timed out waiting for replication to reach {raw_pos}
t phpversion	   	7.2.24-1+0~20191026.31+debian9~1.gbpbbacde+wmf1
t program	   	mediawiki
t raw_pos	   	[object] (Wikimedia\Rdbms\MySQLMasterPos: 171970645-171970645-288070551,171978778-171978778-3298185533,171978924-171978924-759099546,180359242-180359242-170963125)
t referrer	   	 - 
t reqId	   	6648b5800cd87944acdd8f7b
t server	   	jobrunner.discovery.wmnet
t severity	   	err
t shard	   	s8
t sql	   	SELECT MASTER_GTID_WAIT('171978924-171978924-759099546', 1)
t tags	   	input-kafka-rsyslog-udp-localhost, rsyslog-udp-localhost, kafka, es
t timeout	   	1
t timestamp	   	2019-11-25T10:00:36+00:00
t type	   	mediawiki
t url	   	/rpc/RunSingleJob.php
t wait_pos	   	171978924-171978924-759099546
t wiki	   	wikidatawiki

Event Timeline

This seems to be much the same thing as T235265: Slow query on SpecialMostLinked creating lag on wikidata slaves, BTW, just with a different special page.

Change 553339 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] mediawiki: Disable mostrevisions for wikidata

https://gerrit.wikimedia.org/r/553339

Change 553339 merged by Marostegui:
[operations/puppet@production] mediawiki: Disable mostrevisions for wikidata

https://gerrit.wikimedia.org/r/553339

Addshore added a subscriber: Addshore.

We should probably turn this back at some point.
Probably once T208425 is done.

We should probably turn this back at some point.

Alternatively, does any more favorable day exist when to run this query?

I had to kill this query, as it was hitting db1087 hard again and make it lag

mysql (6).png (500×1 px, 57 KB)

Looks like the cron is still there:

# Puppet Name: cron-updatequerypages-mostrevisions-s8@18
0 1 11,25 * * /usr/local/bin/mwscriptwikiset updateSpecialPages.php s8.dblist --override --only=Mostrevisions > /var/log/mediawiki/updateSpecialPages/s8@18-MostRevisions.log 2>&1

Mentioned in SAL (#wikimedia-operations) [2020-03-11T10:59:46Z] <marostegui> Remove Mostrevisions from mwmaint1002 T239072

As spoken on IRC: I have removed it manually from mwmaint1002 and run puppet to check it doesn't get added back.