Page MenuHomePhabricator

Some administrative and log actions on Wikidata take longer than 60 seconds and time out
Closed, ResolvedPublic

Description

In roughly the same vein as T204606, T204871, and T206878, some actions on Wikidata, including granting user rights (such as would be possible using this link), viewing logs, and protecting pages (such as would be possible using this link), cannot be done at present because, in each case, an error page is returned which states that the "entire web request took longer than 60 seconds and timed out".

Event Timeline

Mahir256 created this task.Oct 17 2018, 6:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 17 2018, 6:05 PM
Mahir256 updated the task description. (Show Details)Oct 17 2018, 6:09 PM

Special:Undelete also taking similar time.

Krinkle added a subscriber: Krinkle.

(Triaging as "Not new" from a production branch perspective (the timeout configuration is separate from that). I'll let the Wikidata team triage and decide how to handle the issue itself.)

Mahir256 updated the task description. (Show Details)Oct 17 2018, 7:11 PM
Nikki added a subscriber: Nikki.Oct 18 2018, 2:35 PM
Nikki added a comment.Oct 18 2018, 2:40 PM

I also noticed this when trying to load deleted items (possibly because it shows log information on the page?). It seems to have suddenly started happening in the last day or two. I wasn't having any problems before that.

Special:Nuke doesn't load for me either.

Lydia_Pintscher triaged this task as Unbreak Now! priority.Oct 18 2018, 7:44 PM

This seems pretty bad...

Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptOct 18 2018, 7:44 PM

Not sure if this is related, but this page reports a nearly 7-hour replag (as of this comment) for Wikidata's shard (which strangely is not what Special:DispatchStats is reporting).

Restricted Application added a project: User-Addshore. · View Herald TranscriptOct 19 2018, 10:07 AM

Not sure if this is related, but this page reports a nearly 7-hour replag (as of this comment) for Wikidata's shard (which strangely is not what Special:DispatchStats is reporting).

This is fine and likely relates to T206743#4677827

All of the links provided seem to result in:

PHP Fatal Error from line 46 of /srv/mediawiki/php-1.32.0-wmf.26/includes/libs/rdbms/database/DatabaseMysqli.php: entire web request took longer than 60 seconds and timed out

They also all seem to have some relation to the log table and the IndexPager and LogPager classes in core.

For example this request https://www.wikidata.org/wiki/Special:Log/addshore result in this sctack:

	#0 /srv/mediawiki/php-1.32.0-wmf.26/includes/libs/rdbms/database/DatabaseMysqli.php(46): mysqli::query()
#1 /srv/mediawiki/php-1.32.0-wmf.26/includes/libs/rdbms/database/Database.php(1260): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string)
#2 /srv/mediawiki/php-1.32.0-wmf.26/includes/libs/rdbms/database/Database.php(1183): Wikimedia\Rdbms\Database->doProfiledQuery(string, string, boolean, string)
#3 /srv/mediawiki/php-1.32.0-wmf.26/includes/libs/rdbms/database/Database.php(1693): Wikimedia\Rdbms\Database->query(string, string)
#4 /srv/mediawiki/php-1.32.0-wmf.26/includes/pager/IndexPager.php(369): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#5 /srv/mediawiki/php-1.32.0-wmf.26/includes/pager/IndexPager.php(226): IndexPager->reallyDoQuery(string, integer, boolean)
#6 /srv/mediawiki/php-1.32.0-wmf.26/includes/logging/LogPager.php(442): IndexPager->doQuery()
#7 /srv/mediawiki/php-1.32.0-wmf.26/includes/pager/IndexPager.php(423): LogPager->doQuery()
#8 /srv/mediawiki/php-1.32.0-wmf.26/includes/specials/SpecialLog.php(248): IndexPager->getBody()
#9 /srv/mediawiki/php-1.32.0-wmf.26/includes/specials/SpecialLog.php(137): SpecialLog->show(FormOptions, array)
#10 /srv/mediawiki/php-1.32.0-wmf.26/includes/specialpage/SpecialPage.php(569): SpecialLog->execute(string)
#11 /srv/mediawiki/php-1.32.0-wmf.26/includes/specialpage/SpecialPageFactory.php(568): SpecialPage->run(string)
#12 /srv/mediawiki/php-1.32.0-wmf.26/includes/MediaWiki.php(288): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#13 /srv/mediawiki/php-1.32.0-wmf.26/includes/MediaWiki.php(860): MediaWiki->performRequest()
#14 /srv/mediawiki/php-1.32.0-wmf.26/includes/MediaWiki.php(517): MediaWiki->main()
#15 /srv/mediawiki/php-1.32.0-wmf.26/index.php(42): MediaWiki->run()
#16 /srv/mediawiki/w/index.php(3): include(string)
#17 {main}

It seems running the request with profiling turned on never reports anything to https://performance.wikimedia.org/xhgui/, perhaps because the request stops? :(

Could be related to change tags work T185355

Addshore moved this task from incoming to in progress on the Wikidata board.Oct 19 2018, 10:37 AM

I don't think this actually relates to T204871, other than the fact that it is about requests timing out.

Change 468547 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Revert back wikidata for change_tag backend

https://gerrit.wikimedia.org/r/468547

The very likely cause is T194164: Start reading from change_tag_def in production. We are going to roll this back for wikidata which means tags made between Wed, Oct 17, 1:40 PM until the time of deploy won't be usable after the rollback, but it's temporarily measure until we fix the underlying issue and turn reading from the new column back on, no data is lost, it's just stored in a new format. If such cases is happening for other wikis, feel free to revert them too.

Change 468547 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert back wikidata for change_tag backend

https://gerrit.wikimedia.org/r/468547

Mentioned in SAL (#wikimedia-operations) [2018-10-19T11:33:09Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T207313 UBN - Revert back wikidata for change_tag backend (duration: 00m 59s)

Addshore lowered the priority of this task from Unbreak Now! to High.Oct 19 2018, 11:34 AM
Addshore added a subscriber: Lea_Lacroix_WMDE.

All of the example pages linked in the description now load again.

Changing prio to High to avoid this being listed in the lists of UBNs

Moving to announce now as T207313#4680020 may need announcing?

Change 468980 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Fix bad join on ChangeTag subquery

https://gerrit.wikimedia.org/r/468980

Looking at tendril and running the changed sql queries on prod, my patch should fix it.
The old query was like this:

SELECT /* IndexPager::buildQueryInfo (LogPager) */ log_id, log_type, log_action, log_timestamp, log_namespace, log_title, log_params, log_deleted, user_id, user_name, user_editcount, COALESCE( comment_log_comment.comment_text, log_comment ) AS `log_comment_text`, comment_log_comment.comment_data AS `log_comment_data`, comment_log_comment.comment_id AS `log_comment_cid`, log_user, log_user_text, NULL AS `log_actor`, (SELECT GROUP_CONCAT(ct_tag SEPARATOR ', ') FROM `change_tag` WHERE (ct_log_id=log_id) ) AS `ts_tags` FROM `logging` LEFT JOIN `comment` `comment_log_comment` ON ((comment_log_comment.comment_id = log_comment_id)) LEFT JOIN `user` ON ((user_id=log_user)) WHERE (log_action != 'revision') AND (log_action != 'flow-delete-post') AND (log_action != 'flow-delete-topic') AND (log_action != 'flow-restore-post') AND (log_action != 'flow-restore-topic') AND (log_type NOT IN ('spamblacklist', 'titleblacklist', 'abusefilterprivatedetails', 'suppress')) AND log_type IN ('delete', 'move', 'protect') AND log_namespace = '0' AND log_title = 'Q56399360' AND ((log_deleted & 1) = 0) ORDER BY log_timestamp DESC LIMIT 11;

Which is now changed to:

SELECT /* IndexPager::buildQueryInfo (LogPager) */ log_id, log_type, log_action, log_timestamp, log_namespace, log_title, log_params, log_deleted, user_id, user_name, user_editcount, COALESCE( comment_log_comment.comment_text, log_comment ) AS `log_comment_text`, comment_log_comment.comment_data AS `log_comment_data`, comment_log_comment.comment_id AS `log_comment_cid`, log_user, log_user_text, NULL AS `log_actor`, (SELECT GROUP_CONCAT(ctd_name SEPARATOR ', ') FROM `change_tag`, `change_tag_def` WHERE (ct_log_id=log_id) AND (ct_tag_id=ctd_id) ) AS `ts_tags` FROM `logging` LEFT JOIN `comment` `comment_log_comment` ON ((comment_log_comment.comment_id = log_comment_id)) LEFT JOIN `user` ON ((user_id=log_user)) WHERE (log_action != 'revision') AND (log_action != 'flow-delete-post') AND (log_action != 'flow-delete-topic') AND (log_action != 'flow-restore-post') AND (log_action != 'flow-restore-topic') AND (log_type NOT IN ('spamblacklist', 'titleblacklist', 'abusefilterprivatedetails', 'suppress')) AND log_type IN ('delete', 'move', 'protect') AND log_namespace = '0' AND log_title = 'Q56399360' AND ((log_deleted & 1) = 0) ORDER BY log_timestamp DESC LIMIT 11;

but MySQL is pretty dumb in optimizing subqeuries (SELECT GROUP_CONCAT(ctd_name SEPARATOR ', ') FROM change_tag, change_tag_def WHERE (ct_log_id=log_id) AND (ct_tag_id=ctd_id) ) So I changed it to SELECT GROUP_CONCAT(ctd_name SEPARATOR ',') FROM change_tag INNER JOIN change_tag_def ON ((ct_tag_id=ctd_id)) WHERE ct_rc_id=log_id which is pretty performant (Ran it on prod, works fine)

Change 468980 merged by jenkins-bot:
[mediawiki/core@master] Fix bad join on ChangeTag subquery

https://gerrit.wikimedia.org/r/468980

Change 469114 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@wmf/1.32.0-wmf.26] Fix bad join on ChangeTag subquery

https://gerrit.wikimedia.org/r/469114

Change 469114 merged by jenkins-bot:
[mediawiki/core@wmf/1.32.0-wmf.26] Fix bad join on ChangeTag subquery

https://gerrit.wikimedia.org/r/469114

Mentioned in SAL (#wikimedia-operations) [2018-10-22T23:59:03Z] <ladsgroup@deploy1001> Synchronized php-1.32.0-wmf.26/includes/changetags/ChangeTags.php: SWAT: [[gerrit:469114|Fix bad join on ChangeTag subquery (T207313)]] (duration: 00m 47s)

It looks like this was already announced.

Addshore closed this task as Resolved.Oct 29 2018, 5:16 PM