Page MenuHomePhabricator

Slow query on SpecialMostLinked creating lag on wikidata slaves
Open, Needs TriagePublic

Description

NOTE: Please feel free to change tags, I am not sure who/which Team tag in here.

We have seen that the php /srv/mediawiki-staging/multiversion/MWScript.php updateSpecialPages.php wikidatawiki --override --only=Mostlinked script that runs on mwmaint1002 is making very slow queries to run on wikidata (T234948#5562987).

The query itself is:

SELECT /* SpecialMostLinked::reallyDoQuery www-data@mwmain... */  pl_namespace AS `namespace`,pl_title AS `title`,COUNT(*) AS `value`,page_namespace  FROM `pagelinks` LEFT JOIN `page` ON ((page_namespace = pl_namespace) AND (page_title = pl_title))    GROUP BY pl_namespace,pl_title,page_namespace HAVING COUNT(*) > 1 ORDER BY value DESC LIMIT 5000;

The query plan shows that the query needs some optimization or split:

root@db1087.eqiad.wmnet[wikidatawiki]> explain SELECT /* SpecialMostLinked::reallyDoQuery www-data@mwmain... */  pl_namespace AS `namespace`,pl_title AS `title`,COUNT(*) AS `value`,page_namespace  FROM `pagelinks` LEFT JOIN `page` ON ((page_namespace = pl_namespace) AND (page_title = pl_title))    GROUP BY pl_namespace,pl_title,page_namespace HAVING COUNT(*) > 1 ORDER BY value DESC LIMIT 5000;
+------+-------------+-----------+--------+---------------+------------+---------+---------------------------------------------------------------------+------------+----------------------------------------------+
| id   | select_type | table     | type   | possible_keys | key        | key_len | ref                                                                 | rows       | Extra                                        |
+------+-------------+-----------+--------+---------------+------------+---------+---------------------------------------------------------------------+------------+----------------------------------------------+
|    1 | SIMPLE      | pagelinks | index  | NULL          | PRIMARY    | 265     | NULL                                                                | 1311268897 | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | page      | eq_ref | name_title    | name_title | 261     | wikidatawiki.pagelinks.pl_namespace,wikidatawiki.pagelinks.pl_title |          1 | Using index                                  |
+------+-------------+-----------+--------+---------------+------------+---------+---------------------------------------------------------------------+------------+----------------------------------------------+
2 rows in set (0.00 sec)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 11 2019, 12:57 PM

We could probably eliminate the join with page in this query, but other than that I don't see much opportunity for improvement.

wikiadmin@10.64.32.113(wikidatawiki)> explain SELECT pl_namespace AS `namespace`,pl_title AS `title`,COUNT(*) AS `value`  FROM `pagelinks` GROUP BY pl_namespace,pl_title HAVING COUNT(*) > 1 ORDER BY value DESC LIMIT 5000;
+------+-------------+-----------+-------+---------------+--------------+---------+------+------------+----------------------------------------------+
| id   | select_type | table     | type  | possible_keys | key          | key_len | ref  | rows       | Extra                                        |
+------+-------------+-----------+-------+---------------+--------------+---------+------+------------+----------------------------------------------+
|    1 | SIMPLE      | pagelinks | index | NULL          | pl_namespace | 265     | NULL | 1311306562 | Using index; Using temporary; Using filesort |
+------+-------------+-----------+-------+---------------+--------------+---------+------+------------+----------------------------------------------+

The query is trying to find the 5000 (pl_namespace,pl_title) pairs with the most rows in pagelinks, so unfortunately it's going to have to scan the whole huge table and collect those counts. That expensive query is cached; updateSpecialPages.php is the maintenance script that runs periodically to update that cache.

It looks like it should be running on the vslow replica; if that's not the case we could at least fix that.

Yeah, it does run on the vslow replica, the problem is that it causes small delays which triggers T224422#5563013 so it is all connected :)