Page MenuHomePhabricator

WMFTimeoutException on Commons for WhatLinksHere
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error message
[Xg9SJgpAME8AAIMFtwgAAAAA] 2020-01-03 14:41:03: Fatal exception of type "WMFTimeoutException"
#0 /srv/mediawiki/php-1.35.0-wmf.11/includes/exception/MWExceptionHandler.php(200): {closure}(integer)
#1 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#2 /srv/mediawiki/php-1.35.0-wmf.11/includes/libs/rdbms/database/DatabaseMysqli.php(46): mysqli->query(string)
#3 /srv/mediawiki/php-1.35.0-wmf.11/includes/libs/rdbms/database/Database.php(1308): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string)
#4 /srv/mediawiki/php-1.35.0-wmf.11/includes/libs/rdbms/database/Database.php(1226): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer)
#5 /srv/mediawiki/php-1.35.0-wmf.11/includes/libs/rdbms/database/Database.php(1162): Wikimedia\Rdbms\Database->executeQuery(string, string, integer)
#6 /srv/mediawiki/php-1.35.0-wmf.11/includes/libs/rdbms/database/Database.php(1828): Wikimedia\Rdbms\Database->query(string, string)
#7 /srv/mediawiki/php-1.35.0-wmf.11/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#8 /srv/mediawiki/php-1.35.0-wmf.11/includes/libs/rdbms/database/DBConnRef.php(318): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#9 /srv/mediawiki/php-1.35.0-wmf.11/includes/specials/SpecialWhatLinksHere.php(185): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
#10 /srv/mediawiki/php-1.35.0-wmf.11/includes/specials/SpecialWhatLinksHere.php(194): SpecialWhatLinksHere->{closure}(Wikimedia\Rdbms\MaintainableDBConnRef, string, string)
#11 /srv/mediawiki/php-1.35.0-wmf.11/includes/specials/SpecialWhatLinksHere.php(98): SpecialWhatLinksHere->showIndirectLinks(integer, Title, integer, integer, integer)
#12 /srv/mediawiki/php-1.35.0-wmf.11/includes/specialpage/SpecialPage.php(575): SpecialWhatLinksHere->execute(NULL)
#13 /srv/mediawiki/php-1.35.0-wmf.11/includes/specialpage/SpecialPageFactory.php(611): SpecialPage->run(NULL)
#14 /srv/mediawiki/php-1.35.0-wmf.11/includes/MediaWiki.php(298): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#15 /srv/mediawiki/php-1.35.0-wmf.11/includes/MediaWiki.php(967): MediaWiki->performRequest()
#16 /srv/mediawiki/php-1.35.0-wmf.11/includes/MediaWiki.php(530): MediaWiki->main()
#17 /srv/mediawiki/php-1.35.0-wmf.11/index.php(46): MediaWiki->run()
#18 /srv/mediawiki/w/index.php(3): require(string)
#19 {main}
Impact

Backlink list cannot be get for this template (at least this configuration: all pages except File namespace). I don’t think may people would want to get it, though.

Notes

The backlink table contains ~30M rows for this template, although I don’t think more than some dozens, at most a few hundreds of pages actually use it since its decomission in November. (The count’s slowly but constantly shrinking since I started watching it ten days ago.) I also got some Varnish errors with Error: 503, Backend fetch failed before this error (I reloaded the page a couple of time in the hope it will once succeed).

Details

Request ID
Xg9SJgpAME8AAIMFtwgAAAAA
Request URL
https://commons.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Template%3AInformation%2Fauthor+processing&namespace=6&invert=1

Event Timeline

Aklapper renamed this task from WhatLinksHere timing out on Commons to WMFTimeoutException on Commons for WhatLinksHere.Jan 3 2020, 3:46 PM
Aklapper added a project: Performance Issue.
Aklapper edited Stack Trace. (Show Details)
Umherirrender added a subscriber: Umherirrender.

It works when hide the transclusion: https://commons.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Template%3AInformation%2Fauthor+processing&namespace=6&invert=1&hidetrans=1

That means the query on the templatelinks table timeouts out (There are currently 16M entries there)

I would expect this working. For backlinks the column tl_from_namespace and an index tl_backlinks_namespace was created.
Maybe the database does not pick up the (right) index to exclude the ns 6 here

SELECT page_id, 
       page_namespace, 
       page_title, 
       rd_from, 
       rd_fragment, 
       page_is_redirect 
FROM   (SELECT tl_from, 
               rd_from, 
               rd_fragment 
        FROM   `templatelinks` 
               LEFT JOIN `redirect` 
                      ON ( ( rd_from = tl_from ) 
                           AND rd_title = 'Information/author_processing' 
                           AND ( rd_interwiki = '' 
                                  OR rd_interwiki IS NULL ) 
                           AND rd_namespace = 10 ) 
               JOIN `page` 
                 ON (( tl_from = page_id )) 
        WHERE  tl_namespace = 10 
               AND tl_title = 'Information/author_processing' 
               AND ( tl_from_namespace != 6 ) 
        ORDER  BY tl_from 
        LIMIT  102) `temp_backlink_range` 
       JOIN `page` 
         ON (( tl_from = page_id )) 
ORDER  BY page_id 
LIMIT  51

API also picks them up:
https://commons.wikimedia.org/w/api.php?action=query&list=embeddedin&eilimit=50&eititle=Template%3AInformation%2Fauthor+processing&einamespace=0|1|2|3|4|5|7|8|9|10|11|12|13|14|15|100|101|102|103|104|105|106|107|460|461|486|487|490|491|828|829|1198|1199|2300|2301|2302|2303|2600

But it does not have an invert option (you have to create a list with the namespace missing) and it is missing the LEFT JOIN mit redirect table:

SELECT /*! STRAIGHT_JOIN */ page_id, 
                            page_title, 
                            page_namespace, 
                            page_is_redirect, 
                            page_namespace AS `from_ns` 
FROM   `templatelinks`, 
       `page` 
WHERE  ( tl_from = page_id ) 
       AND tl_title = 'Information/author_processing' 
       AND tl_namespace = 10 
       AND tl_from_namespace IN ( 0, 1, 2, 3, 
                                  4, 5, 7, 8, 
                                  9, 10, 11, 12, 
                                  13, 14, 15, 100, 
                                  101, 102, 103, 104, 
                                  105, 106, 107, 460, 
                                  461, 486, 487, 490, 
                                  491, 828, 829, 1198, 
                                  1199, 2300, 2301, 2302, 
                                  2303, 2600 ) 
ORDER  BY tl_from_namespace, 
          tl_from 
LIMIT  51
Marostegui added a subscriber: Marostegui.

It works when hide the transclusion: https://commons.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Template%3AInformation%2Fauthor+processing&namespace=6&invert=1&hidetrans=1

That means the query on the templatelinks table timeouts out (There are currently 16M entries there)

I would expect this working. For backlinks the column tl_from_namespace and an index tl_backlinks_namespace was created.
Maybe the database does not pick up the (right) index to exclude the ns 6 here

SELECT page_id, 
       page_namespace, 
       page_title, 
       rd_from, 
       rd_fragment, 
       page_is_redirect 
FROM   (SELECT tl_from, 
               rd_from, 
               rd_fragment 
        FROM   `templatelinks` 
               LEFT JOIN `redirect` 
                      ON ( ( rd_from = tl_from ) 
                           AND rd_title = 'Information/author_processing' 
                           AND ( rd_interwiki = '' 
                                  OR rd_interwiki IS NULL ) 
                           AND rd_namespace = 10 ) 
               JOIN `page` 
                 ON (( tl_from = page_id )) 
        WHERE  tl_namespace = 10 
               AND tl_title = 'Information/author_processing' 
               AND ( tl_from_namespace != 6 ) 
        ORDER  BY tl_from 
        LIMIT  102) `temp_backlink_range` 
       JOIN `page` 
         ON (( tl_from = page_id )) 
ORDER  BY page_id 
LIMIT  51

This query is quite crazy indeed:

+------+-------------+---------------+--------+---------------------------------------------+--------------+---------+-----------------------------------+----------+------------------------------------+
| id   | select_type | table         | type   | possible_keys                               | key          | key_len | ref                               | rows     | Extra                              |
+------+-------------+---------------+--------+---------------------------------------------+--------------+---------+-----------------------------------+----------+------------------------------------+
|    1 | PRIMARY     | <derived2>    | ALL    | NULL                                        | NULL         | NULL    | NULL                              |      102 | Using temporary; Using filesort    |
|    1 | PRIMARY     | page          | eq_ref | PRIMARY                                     | PRIMARY      | 4       | temp_backlink_range.tl_from       |        1 |                                    |
|    2 | DERIVED     | templatelinks | ref    | PRIMARY,tl_namespace,tl_backlinks_namespace | tl_namespace | 261     | const,const                       | 40839520 | Using index condition; Using where |
|    2 | DERIVED     | redirect      | eq_ref | PRIMARY,rd_ns_title                         | PRIMARY      | 4       | commonswiki.templatelinks.tl_from |        1 | Using where                        |
|    2 | DERIVED     | page          | eq_ref | PRIMARY                                     | PRIMARY      | 4       | commonswiki.templatelinks.tl_from |        1 | Using index                        |
+------+-------------+---------------+--------+---------------------------------------------+--------------+---------+-----------------------------------+----------+------------------------------------+
5 rows in set (0.013 sec)

However using tl_backlinks_namespace doesn't really make any difference, it keeps scanning millions of rows.

+------+-------------+---------------+--------+------------------------+------------------------+---------+-----------------------------------+-----------+------------------------------------------+
| id   | select_type | table         | type   | possible_keys          | key                    | key_len | ref                               | rows      | Extra                                    |
+------+-------------+---------------+--------+------------------------+------------------------+---------+-----------------------------------+-----------+------------------------------------------+
|    1 | PRIMARY     | <derived2>    | ALL    | NULL                   | NULL                   | NULL    | NULL                              |       102 | Using temporary; Using filesort          |
|    1 | PRIMARY     | page          | eq_ref | PRIMARY                | PRIMARY                | 4       | temp_backlink_range.tl_from       |         1 |                                          |
|    2 | DERIVED     | templatelinks | range  | tl_backlinks_namespace | tl_backlinks_namespace | 4       | NULL                              | 431574438 | Using where; Using index; Using filesort |
|    2 | DERIVED     | redirect      | eq_ref | PRIMARY,rd_ns_title    | PRIMARY                | 4       | commonswiki.templatelinks.tl_from |         1 | Using where                              |
|    2 | DERIVED     | page          | eq_ref | PRIMARY                | PRIMARY                | 4       | commonswiki.templatelinks.tl_from |         1 | Using index                              |
+------+-------------+---------------+--------+------------------------+------------------------+---------+-----------------------------------+-----------+------------------------------------------+
5 rows in set (0.002 sec)

Adding Platform Engineering to see if they can come up with some other strategy here. Maybe splitting the query?
Removing DBA tag as there is no actionable for us at the moment, but I will stay on the task to follow up

eprodromou added a subscriber: eprodromou.

All right, seems worth a look.

I think it will be fast if the != 6 is replaced by IN() as in the API query. Then it can use the tl_backlinks_namespace index efficiently. It appears to treat the IN() like a union.

I ran the query above with "Information" instead of "Information/author_processing", since the latter no longer has many rows. It was slow the first time (15s) and moderately slow on subsequent runs (~0.4s). I reduced it down to:

SELECT tl_from FROM   `templatelinks`  WHERE  tl_namespace = 10  AND tl_title = 'Information'  AND ( tl_from_namespace != 6 )  ORDER  BY tl_from  LIMIT  102;

There's no joins or anything complicated, but it was still slow (0.51s). But the equivalent query with IN():

SELECT tl_from FROM   `templatelinks`  WHERE  tl_namespace = 10  AND tl_title = 'Information'  AND ( tl_from_namespace IN ( 0, 1, 2, 3,  4, 5, 7, 8,  9, 10, 11, 12,  13, 14, 15, 100,  101, 102, 103, 104,  105, 106, 107, 460,  461, 486, 487, 490,  491, 828, 829, 1198,  1199, 2300, 2301, 2302,  2303, 2600 ) ) ORDER  BY tl_from  LIMIT  102;

reported that it ran in 0.00s. Applying this method to the full query, with subselect and joins, it ran in 0.01s.

Change 655806 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Do inverse namespace filtering of links tables with a big IN()

https://gerrit.wikimedia.org/r/655806

Patch looks good, +1'd. Giving a bit more time for @Marostegui or others to common on the solution then will +2 if there have been no objections.

I think it will be fast if the != 6 is replaced by IN() as in the API query. Then it can use the tl_backlinks_namespace index efficiently. It appears to treat the IN() like a union.

I ran the query above with "Information" instead of "Information/author_processing", since the latter no longer has many rows. It was slow the first time (15s) and moderately slow on subsequent runs (~0.4s). I reduced it down to:

SELECT tl_from FROM   `templatelinks`  WHERE  tl_namespace = 10  AND tl_title = 'Information'  AND ( tl_from_namespace != 6 )  ORDER  BY tl_from  LIMIT  102;

There's no joins or anything complicated, but it was still slow (0.51s). But the equivalent query with IN():

SELECT tl_from FROM   `templatelinks`  WHERE  tl_namespace = 10  AND tl_title = 'Information'  AND ( tl_from_namespace IN ( 0, 1, 2, 3,  4, 5, 7, 8,  9, 10, 11, 12,  13, 14, 15, 100,  101, 102, 103, 104,  105, 106, 107, 460,  461, 486, 487, 490,  491, 828, 829, 1198,  1199, 2300, 2301, 2302,  2303, 2600 ) ) ORDER  BY tl_from  LIMIT  102;

reported that it ran in 0.00s. Applying this method to the full query, with subselect and joins, it ran in 0.01s.

The second query is definitely better indeed, using tl_backlinks_namespace looks much better than tl_namespace. (300 rows scanned vs 250k) I have run the query on a cold host and the difference is quite big, 16 seconds vs 0.04
Which is also shown on the handler's read:

| Handler_read_next          | 47733 |

vs

| Handler_read_next          | 358   |

Thanks for looking, sounds like a win. Patch merged.

BPirkle claimed this task.

Change 655806 merged by jenkins-bot:
[mediawiki/core@master] Do inverse namespace filtering of links tables with a big IN()

https://gerrit.wikimedia.org/r/655806