Page MenuHomePhabricator

Unable to access SpecialNewpages on enwiki: WMFTimeoutException
Closed, DeclinedPublicPRODUCTION ERROR

Description

Error
message
[XWo3eApAICwAADjVSCgAAABU] 2019-08-31 09:03:01: Fatal exception of type "WMFTimeoutException"
Impact

Notes

Querying new pages not in user or user talk namespace, with maximum size of 10

Details

Request ID
XWo3eApAICwAADjVSCgAAABU
Request URL
https://en.wikipedia.org/wiki/Special:NewPages?namespace=2&invert=1&associated=1&tagfilter=&username=&size-mode=max&size=10&wpFormIdentifier=newpagesform
Stack Trace
#0 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DatabaseMysqli.php(43): {closure}(integer)
#1 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1295): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string)
#2 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1212): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer)
#3 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1149): Wikimedia\Rdbms\Database->executeQuery(string, string, integer)
#4 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1796): Wikimedia\Rdbms\Database->query(string, string)
#5 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#6 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(315): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#7 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(413): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
#8 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(258): IndexPager->reallyDoQuery(string, integer, boolean)
#9 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(604): IndexPager->doQuery()
#10 /srv/mediawiki/php-1.34.0-wmf.19/includes/specials/SpecialNewpages.php(158): IndexPager->getNumRows()
#11 /srv/mediawiki/php-1.34.0-wmf.19/includes/specialpage/SpecialPage.php(571): SpecialNewpages->execute(NULL)
#12 /srv/mediawiki/php-1.34.0-wmf.19/includes/specialpage/SpecialPageFactory.php(582): SpecialPage->run(NULL)
#13 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(296): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#14 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(892): MediaWiki->performRequest()
#15 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(523): MediaWiki->main()
#16 /srv/mediawiki/php-1.34.0-wmf.19/index.php(42): MediaWiki->run()
#17 /srv/mediawiki/w/index.php(3): require(string)
#18 {main}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
exception.message	       	the execution time limit of 60 seconds was exceeded
exception.trace
#0 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DatabaseMysqli.php(43): {closure}(integer)
#1 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1295): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string)
#2 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1212): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer)
#3 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1149): Wikimedia\Rdbms\Database->executeQuery(string, string, integer)
#4 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1796): Wikimedia\Rdbms\Database->query(string, string)
#5 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#6 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(315): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#7 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(413): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
#8 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(258): IndexPager->reallyDoQuery(string, integer, boolean)
#9 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(604): IndexPager->doQuery()
#10 /srv/mediawiki/php-1.34.0-wmf.19/includes/specials/SpecialNewpages.php(158): IndexPager->getNumRows()
#11 /srv/mediawiki/php-1.34.0-wmf.19/includes/specialpage/SpecialPage.php(571): SpecialNewpages->execute(NULL)
#12 /srv/mediawiki/php-1.34.0-wmf.19/includes/specialpage/SpecialPageFactory.php(582): SpecialPage->run(NULL)
#13 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(296): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#14 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(892): MediaWiki->performRequest()
#15 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(523): MediaWiki->main()
#16 /srv/mediawiki/php-1.34.0-wmf.19/index.php(42): MediaWiki->run()
#17 /srv/mediawiki/w/index.php(3): require(string)
#18 {main}
exception.message	       	the execution time limit of 60 seconds was exceeded
exception.trace
[snip]

I assumed this was the case, but the issue is that the exception should be handled gracefully, rather than how it was presented - maybe return a blank list with the error noted there?

Got it again in [XXaZsApAMFsAAC9w-JkAAAAA] 2019-09-09 18:29:00: Fatal exception of type "WMFTimeoutException"

Krinkle renamed this task from Time out when querying new pages: WMFTimeoutException to SpecialNewpages on enwiki times out: WMFTimeoutException.Sep 17 2019, 4:06 PM
Krinkle renamed this task from SpecialNewpages on enwiki times out: WMFTimeoutException to Unable to access SpecialNewpages on enwiki: WMFTimeoutException.

And again: [XbVCNApAICsAAAzI15AAAADM] 2019-10-27 07:08:33: Fatal exception of type "WMFTimeoutException"

Maybe the query is to complex. The query must look similar to this one (depending on the actor and comment migration stage)

SELECT rc_id ,
       rc_timestamp,
       rc_namespace,
       rc_title,
       rc_minor,
       rc_bot,
       rc_new,
       rc_cur_id,
       rc_this_oldid,
       rc_last_oldid,
       rc_type,
       rc_source,
       rc_patrolled,
       rc_ip,
       rc_old_len,
       rc_new_len,
       rc_deleted,
       rc_logid,
       rc_log_type,
       rc_log_action,
       rc_params,
       comment_rc_comment.comment_text AS `rc_comment_text`,
       comment_rc_comment.comment_data AS `rc_comment_data`,
       comment_rc_comment.comment_id AS `rc_comment_cid`,
       actor_rc_user.actor_user AS `rc_user`,
       actor_rc_user.actor_name AS `rc_user_text`,
       rc_actor,
       page_len AS `length`,
       page_latest AS `rev_id`,
       page_namespace,
       page_title,
  (SELECT GROUP_CONCAT(ctd_name SEPARATOR ',')
   FROM `change_tag`
   JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))
   WHERE ct_rc_id=rc_id ) AS `ts_tags`
FROM `recentchanges`
JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id))
JOIN `actor` `actor_rc_user` ON ((actor_rc_user.actor_id = rc_actor))
JOIN `page` ON ((page_id=rc_cur_id))
WHERE rc_new = '1'
  AND (page_len <= 10)
  AND (rc_namespace != '2'
       AND rc_namespace != '3')
  AND page_is_redirect = '0'
ORDER BY rc_timestamp DESC
LIMIT 51

There is an index for the new flag and the namespace, it also covers the order by field.
Another index covers the redirect check and page length

CREATE INDEX /*i*/new_name_timestamp ON /*_*/recentchanges (rc_new,rc_namespace,rc_timestamp);
CREATE INDEX /*i*/page_redirect_namespace_len ON /*_*/page (page_is_redirect, page_namespace, page_len);

Did something change that made these start to be logged as WMFTimeoutException rather than DBError? It makes it much harder to debug when the exception message doesn't report the query and DB server involved.

Anyway, chances are it's a poor choice by MariaDB's planner, but without the query it's hard to tell. For queries involving recentchanges, it's also fairly likely that two weeks has caused enough rows to be deleted that even having the query might not let us reproduce it.

The query must look similar to this one (depending on the actor and comment migration stage)

SELECT rc_id ,
[...]
FROM `recentchanges`
JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id))
JOIN `actor` `actor_rc_user` ON ((actor_rc_user.actor_id = rc_actor))
JOIN `page` ON ((page_id=rc_cur_id))
WHERE rc_new = '1'
  AND (page_len <= 10)
  AND (rc_namespace != '2'
       AND rc_namespace != '3')
  AND page_is_redirect = '0'
ORDER BY rc_timestamp DESC
LIMIT 51

There is an index for the new flag and the namespace, it also covers the order by field.
Another index covers the redirect check and page length

CREATE INDEX /*i*/new_name_timestamp ON /*_*/recentchanges (rc_new,rc_namespace,rc_timestamp);
CREATE INDEX /*i*/page_redirect_namespace_len ON /*_*/page (page_is_redirect, page_namespace, page_len);

It would have to filesort that query with either of those indexes. The only index that wouldn't filesort would be rc_timestamp (on (rc_timestamp)), but depending on how often pages have page_len <= 10 it might have to scan a large number of rows to find 51. So it seems likely it's a choice between bad options, and sometimes it makes a worse choice than otherwise.

Did something change that made these start to be logged as WMFTimeoutException rather than DBError? It makes it much harder to debug when the exception message doesn't report the query and DB server involved.

Anyway, chances are it's a poor choice by MariaDB's planner, but without the query it's hard to tell. For queries involving recentchanges, it's also fairly likely that two weeks has caused enough rows to be deleted that even having the query might not let us reproduce it.

Still reproducible - same URL: [XcyA4QpAICAAABiNBH4AAACT] 2019-11-13 22:18:23: Fatal exception of type "WMFTimeoutException"

Still reproducible - same URL: [XcyA4QpAICAAABiNBH4AAACT] 2019-11-13 22:18:23: Fatal exception of type "WMFTimeoutException"

That last error happens on enwiki.
I am mentioning it because given that T233135 touches recentchanges I thought it could be somehow related, but we haven't touched s1 (enwiki) yet, so it can be discarded.

That query takes 8 seconds on db1089 with the following plan:

+------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+
| id   | select_type        | table              | type   | possible_keys                                                                                                                        | key                  | key_len | ref                                | rows | Extra       |
+------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+
|    1 | PRIMARY            | recentchanges      | index  | rc_cur_id,new_name_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp | rc_timestamp         | 16      | NULL                               | 1552 | Using where |
|    1 | PRIMARY            | page               | eq_ref | PRIMARY,page_len,page_redirect_namespace_len                                                                                         | PRIMARY              | 4       | enwiki.recentchanges.rc_cur_id     |    1 | Using where |
|    1 | PRIMARY            | actor_rc_user      | eq_ref | PRIMARY                                                                                                                              | PRIMARY              | 8       | enwiki.recentchanges.rc_actor      |    1 |             |
|    1 | PRIMARY            | comment_rc_comment | eq_ref | PRIMARY                                                                                                                              | PRIMARY              | 8       | enwiki.recentchanges.rc_comment_id |    1 |             |
|    2 | DEPENDENT SUBQUERY | change_tag         | ref    | change_tag_rc_tag_id,change_tag_tag_id_id                                                                                            | change_tag_rc_tag_id | 5       | enwiki.recentchanges.rc_id         |    1 | Using index |
|    2 | DEPENDENT SUBQUERY | change_tag_def     | eq_ref | PRIMARY                                                                                                                              | PRIMARY              | 4       | enwiki.change_tag.ct_tag_id        |    1 |             |
+------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+
6 rows in set (0.01 sec)

A quick check on all the enwiki hosts, they show the same plan for this same query and they always use rc_timestamp (which makes sense).

However, if we actually run a show explain for we can see that the number of scanned rows is a lot higher than initially predicted:

|    1 | PRIMARY            | recentchanges      | index  | rc_cur_id,new_name_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp | rc_timestamp         | 16      | NULL                               | 7452549 | Using where; Using filesort

Still reproducible - same URL: [XcyA4QpAICAAABiNBH4AAACT] 2019-11-13 22:18:23: Fatal exception of type "WMFTimeoutException"

I'm not able to reproduce it with the URL from that exception. I tried both while logged in and logged out.

When I access the URL with some additional logging hacked in, I get the following query:

SELECT  rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,actor_rc_user.actor_user AS `rc_user`,actor_rc_user.actor_name AS `rc_user_text`,rc_actor,page_len AS `length`,page_latest AS `rev_id`,page_namespace,page_title,fp_stable,fp_pending_since,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rc_id=rc_id  ) AS `ts_tags`  FROM `recentchanges` JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) JOIN `actor` `actor_rc_user` ON ((actor_rc_user.actor_id = rc_actor)) JOIN `page` ON ((page_id=rc_cur_id)) LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id))   WHERE rc_new = '1' AND (page_len <= 10) AND (rc_namespace != '2'AND rc_namespace != '3') AND page_is_redirect = '0'  ORDER BY rc_timestamp DESC LIMIT 51

That gives a similar plan on all enwiki replicas:

+------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+
| id   | select_type        | table              | type   | possible_keys                                                                                                                        | key                  | key_len | ref                                | rows | Extra       |
+------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+
|    1 | PRIMARY            | recentchanges      | index  | rc_cur_id,new_name_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp | rc_timestamp         | 16      | NULL                               | 1643 | Using where |
|    1 | PRIMARY            | flaggedpages       | eq_ref | PRIMARY                                                                                                                              | PRIMARY              | 4       | enwiki.recentchanges.rc_cur_id     |    1 |             |
|    1 | PRIMARY            | page               | eq_ref | PRIMARY,page_len,page_redirect_namespace_len                                                                                         | PRIMARY              | 4       | enwiki.recentchanges.rc_cur_id     |    1 | Using where |
|    1 | PRIMARY            | actor_rc_user      | eq_ref | PRIMARY                                                                                                                              | PRIMARY              | 8       | enwiki.recentchanges.rc_actor      |    1 |             |
|    1 | PRIMARY            | comment_rc_comment | eq_ref | PRIMARY                                                                                                                              | PRIMARY              | 8       | enwiki.recentchanges.rc_comment_id |    1 |             |
|    2 | DEPENDENT SUBQUERY | change_tag         | ref    | change_tag_rc_tag_id,change_tag_tag_id_id                                                                                            | change_tag_rc_tag_id | 5       | enwiki.recentchanges.rc_id         |    1 | Using index |
|    2 | DEPENDENT SUBQUERY | change_tag_def     | eq_ref | PRIMARY                                                                                                                              | PRIMARY              | 4       | enwiki.change_tag.ct_tag_id        |    1 |             |
+------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+

8 seconds runtime wouldn't be enough to get the WMFTimeoutException exception, though. There must be some other factor, if only the sort of thing that made T234450 only occasionally be a problem.

This is still an issue (completely forgot about this and just had the same problem with the same the exact same url)

This is still an issue (completely forgot about this and just had the same problem with the same the exact same url)

https://en.wikipedia.org/wiki/Special:NewPages?namespace=2&invert=1&associated=1&tagfilter=&username=&size-mode=max&size=10&wpFormIdentifier=newpagesform just gave me an ERR_CONNECTION_TIMED_OUT on Chrome for iOS on my 1st attempt then loaded fine but slow (27468ms (mw1273.eqiad)) after refreshing

For reference, 187ms (mw1365.eqiad) is load time for the enwiki main page.

Krinkle subscribed.

I'm unable to reproduce this with any of the URLs mentioned in this task. As I understand it, these are not the default queries for Special:NewPages, but a more specialised customisation. For those, performance is currently expected to be worse. The alternative would be to not allow them. The status quo is to allow them, knowing they are close and sometimes time out.

If a particular checkbox or option within Special:NewPages consistently makes it time out, that might be something we can do something about with limited resources. Or conversely, if the page can't be used at with the default query on some wikis, that might also point to a larger issue.

As-is, if it generally works but the same query can be re-tried and not time out, I think that's acceptable for now.