Error
[XWo3eApAICwAADjVSCgAAABU] 2019-08-31 09:03:01: Fatal exception of type "WMFTimeoutException"
Impact
…
Notes
Querying new pages not in user or user talk namespace, with maximum size of 10
…
[XWo3eApAICwAADjVSCgAAABU] 2019-08-31 09:03:01: Fatal exception of type "WMFTimeoutException"
…
Querying new pages not in user or user talk namespace, with maximum size of 10
…
#0 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DatabaseMysqli.php(43): {closure}(integer) #1 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1295): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string) #2 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1212): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer) #3 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1149): Wikimedia\Rdbms\Database->executeQuery(string, string, integer) #4 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1796): Wikimedia\Rdbms\Database->query(string, string) #5 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array) #6 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(315): Wikimedia\Rdbms\DBConnRef->__call(string, array) #7 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(413): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array) #8 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(258): IndexPager->reallyDoQuery(string, integer, boolean) #9 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(604): IndexPager->doQuery() #10 /srv/mediawiki/php-1.34.0-wmf.19/includes/specials/SpecialNewpages.php(158): IndexPager->getNumRows() #11 /srv/mediawiki/php-1.34.0-wmf.19/includes/specialpage/SpecialPage.php(571): SpecialNewpages->execute(NULL) #12 /srv/mediawiki/php-1.34.0-wmf.19/includes/specialpage/SpecialPageFactory.php(582): SpecialPage->run(NULL) #13 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(296): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext) #14 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(892): MediaWiki->performRequest() #15 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(523): MediaWiki->main() #16 /srv/mediawiki/php-1.34.0-wmf.19/index.php(42): MediaWiki->run() #17 /srv/mediawiki/w/index.php(3): require(string) #18 {main}
exception.message the execution time limit of 60 seconds was exceeded exception.trace #0 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DatabaseMysqli.php(43): {closure}(integer) #1 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1295): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string) #2 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1212): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer) #3 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1149): Wikimedia\Rdbms\Database->executeQuery(string, string, integer) #4 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/Database.php(1796): Wikimedia\Rdbms\Database->query(string, string) #5 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array) #6 /srv/mediawiki/php-1.34.0-wmf.19/includes/libs/rdbms/database/DBConnRef.php(315): Wikimedia\Rdbms\DBConnRef->__call(string, array) #7 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(413): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array) #8 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(258): IndexPager->reallyDoQuery(string, integer, boolean) #9 /srv/mediawiki/php-1.34.0-wmf.19/includes/pager/IndexPager.php(604): IndexPager->doQuery() #10 /srv/mediawiki/php-1.34.0-wmf.19/includes/specials/SpecialNewpages.php(158): IndexPager->getNumRows() #11 /srv/mediawiki/php-1.34.0-wmf.19/includes/specialpage/SpecialPage.php(571): SpecialNewpages->execute(NULL) #12 /srv/mediawiki/php-1.34.0-wmf.19/includes/specialpage/SpecialPageFactory.php(582): SpecialPage->run(NULL) #13 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(296): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext) #14 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(892): MediaWiki->performRequest() #15 /srv/mediawiki/php-1.34.0-wmf.19/includes/MediaWiki.php(523): MediaWiki->main() #16 /srv/mediawiki/php-1.34.0-wmf.19/index.php(42): MediaWiki->run() #17 /srv/mediawiki/w/index.php(3): require(string) #18 {main}
I assumed this was the case, but the issue is that the exception should be handled gracefully, rather than how it was presented - maybe return a blank list with the error noted there?
Got it again in [XXaZsApAMFsAAC9w-JkAAAAA] 2019-09-09 18:29:00: Fatal exception of type "WMFTimeoutException"
And again: [XbVCNApAICsAAAzI15AAAADM] 2019-10-27 07:08:33: Fatal exception of type "WMFTimeoutException"
Maybe the query is to complex. The query must look similar to this one (depending on the actor and comment migration stage)
SELECT rc_id , rc_timestamp, rc_namespace, rc_title, rc_minor, rc_bot, rc_new, rc_cur_id, rc_this_oldid, rc_last_oldid, rc_type, rc_source, rc_patrolled, rc_ip, rc_old_len, rc_new_len, rc_deleted, rc_logid, rc_log_type, rc_log_action, rc_params, comment_rc_comment.comment_text AS `rc_comment_text`, comment_rc_comment.comment_data AS `rc_comment_data`, comment_rc_comment.comment_id AS `rc_comment_cid`, actor_rc_user.actor_user AS `rc_user`, actor_rc_user.actor_name AS `rc_user_text`, rc_actor, page_len AS `length`, page_latest AS `rev_id`, page_namespace, page_title, (SELECT GROUP_CONCAT(ctd_name SEPARATOR ',') FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE ct_rc_id=rc_id ) AS `ts_tags` FROM `recentchanges` JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) JOIN `actor` `actor_rc_user` ON ((actor_rc_user.actor_id = rc_actor)) JOIN `page` ON ((page_id=rc_cur_id)) WHERE rc_new = '1' AND (page_len <= 10) AND (rc_namespace != '2' AND rc_namespace != '3') AND page_is_redirect = '0' ORDER BY rc_timestamp DESC LIMIT 51
There is an index for the new flag and the namespace, it also covers the order by field.
Another index covers the redirect check and page length
CREATE INDEX /*i*/new_name_timestamp ON /*_*/recentchanges (rc_new,rc_namespace,rc_timestamp); CREATE INDEX /*i*/page_redirect_namespace_len ON /*_*/page (page_is_redirect, page_namespace, page_len);
Did something change that made these start to be logged as WMFTimeoutException rather than DBError? It makes it much harder to debug when the exception message doesn't report the query and DB server involved.
Anyway, chances are it's a poor choice by MariaDB's planner, but without the query it's hard to tell. For queries involving recentchanges, it's also fairly likely that two weeks has caused enough rows to be deleted that even having the query might not let us reproduce it.
It would have to filesort that query with either of those indexes. The only index that wouldn't filesort would be rc_timestamp (on (rc_timestamp)), but depending on how often pages have page_len <= 10 it might have to scan a large number of rows to find 51. So it seems likely it's a choice between bad options, and sometimes it makes a worse choice than otherwise.
Still reproducible - same URL: [XcyA4QpAICAAABiNBH4AAACT] 2019-11-13 22:18:23: Fatal exception of type "WMFTimeoutException"
That last error happens on enwiki.
I am mentioning it because given that T233135 touches recentchanges I thought it could be somehow related, but we haven't touched s1 (enwiki) yet, so it can be discarded.
That query takes 8 seconds on db1089 with the following plan:
+------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+ | 1 | PRIMARY | recentchanges | index | rc_cur_id,new_name_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp | rc_timestamp | 16 | NULL | 1552 | Using where | | 1 | PRIMARY | page | eq_ref | PRIMARY,page_len,page_redirect_namespace_len | PRIMARY | 4 | enwiki.recentchanges.rc_cur_id | 1 | Using where | | 1 | PRIMARY | actor_rc_user | eq_ref | PRIMARY | PRIMARY | 8 | enwiki.recentchanges.rc_actor | 1 | | | 1 | PRIMARY | comment_rc_comment | eq_ref | PRIMARY | PRIMARY | 8 | enwiki.recentchanges.rc_comment_id | 1 | | | 2 | DEPENDENT SUBQUERY | change_tag | ref | change_tag_rc_tag_id,change_tag_tag_id_id | change_tag_rc_tag_id | 5 | enwiki.recentchanges.rc_id | 1 | Using index | | 2 | DEPENDENT SUBQUERY | change_tag_def | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.change_tag.ct_tag_id | 1 | | +------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+ 6 rows in set (0.01 sec)
A quick check on all the enwiki hosts, they show the same plan for this same query and they always use rc_timestamp (which makes sense).
However, if we actually run a show explain for we can see that the number of scanned rows is a lot higher than initially predicted:
| 1 | PRIMARY | recentchanges | index | rc_cur_id,new_name_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp | rc_timestamp | 16 | NULL | 7452549 | Using where; Using filesort
I'm not able to reproduce it with the URL from that exception. I tried both while logged in and logged out.
When I access the URL with some additional logging hacked in, I get the following query:
SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,actor_rc_user.actor_user AS `rc_user`,actor_rc_user.actor_name AS `rc_user_text`,rc_actor,page_len AS `length`,page_latest AS `rev_id`,page_namespace,page_title,fp_stable,fp_pending_since,(SELECT GROUP_CONCAT(ctd_name SEPARATOR ',') FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE ct_rc_id=rc_id ) AS `ts_tags` FROM `recentchanges` JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) JOIN `actor` `actor_rc_user` ON ((actor_rc_user.actor_id = rc_actor)) JOIN `page` ON ((page_id=rc_cur_id)) LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) WHERE rc_new = '1' AND (page_len <= 10) AND (rc_namespace != '2'AND rc_namespace != '3') AND page_is_redirect = '0' ORDER BY rc_timestamp DESC LIMIT 51
That gives a similar plan on all enwiki replicas:
+------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+ | 1 | PRIMARY | recentchanges | index | rc_cur_id,new_name_timestamp,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp | rc_timestamp | 16 | NULL | 1643 | Using where | | 1 | PRIMARY | flaggedpages | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.recentchanges.rc_cur_id | 1 | | | 1 | PRIMARY | page | eq_ref | PRIMARY,page_len,page_redirect_namespace_len | PRIMARY | 4 | enwiki.recentchanges.rc_cur_id | 1 | Using where | | 1 | PRIMARY | actor_rc_user | eq_ref | PRIMARY | PRIMARY | 8 | enwiki.recentchanges.rc_actor | 1 | | | 1 | PRIMARY | comment_rc_comment | eq_ref | PRIMARY | PRIMARY | 8 | enwiki.recentchanges.rc_comment_id | 1 | | | 2 | DEPENDENT SUBQUERY | change_tag | ref | change_tag_rc_tag_id,change_tag_tag_id_id | change_tag_rc_tag_id | 5 | enwiki.recentchanges.rc_id | 1 | Using index | | 2 | DEPENDENT SUBQUERY | change_tag_def | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.change_tag.ct_tag_id | 1 | | +------+--------------------+--------------------+--------+--------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------+------+-------------+
8 seconds runtime wouldn't be enough to get the WMFTimeoutException exception, though. There must be some other factor, if only the sort of thing that made T234450 only occasionally be a problem.
This is still an issue (completely forgot about this and just had the same problem with the same the exact same url)
https://en.wikipedia.org/wiki/Special:NewPages?namespace=2&invert=1&associated=1&tagfilter=&username=&size-mode=max&size=10&wpFormIdentifier=newpagesform just gave me an ERR_CONNECTION_TIMED_OUT on Chrome for iOS on my 1st attempt then loaded fine but slow (27468ms (mw1273.eqiad)) after refreshing
For reference, 187ms (mw1365.eqiad) is load time for the enwiki main page.
I'm unable to reproduce this with any of the URLs mentioned in this task. As I understand it, these are not the default queries for Special:NewPages, but a more specialised customisation. For those, performance is currently expected to be worse. The alternative would be to not allow them. The status quo is to allow them, knowing they are close and sometimes time out.
If a particular checkbox or option within Special:NewPages consistently makes it time out, that might be something we can do something about with limited resources. Or conversely, if the page can't be used at with the default query on some wikis, that might also point to a larger issue.
As-is, if it generally works but the same query can be re-tried and not time out, I think that's acceptable for now.