Page MenuHomePhabricator

Unable to perform revision deletion on Commons
Closed, ResolvedPublic

Description

I have been unable to perform revision deletion on Commons for a few hours now. Every time I try to do so I get an "entire web request took longer than 60 seconds and timed out" PHP fatal error. I asked some admins on other projects and they don't seem to have a problem but revdel hasn't been performed on Commons for about 12 hours now so I don't know if it is just me or a bigger issue. After asking on IRC it was recommended that I open a ticket. Would someone be able to look into this for me? It would be appreciated.

Event Timeline

Note security did some minor adjustments to the revdel process on tuesday but nothing that should cause this.

Just tested on officewiki and it worked fine, so it's not affecting all wmf.3 wikis at least.

Yup, triggered on GET to https://commons.wikimedia.org/w/index.php?title=User%3AJdforrester_%28WMF%29%2Fsandbox&action=revisiondelete&type=revision&ids%5B348296966%5D=1:

[ XMpRwApAMFQAAE8PXE0AAAAQ ] 2019-05-02 02:12:33: Fatal exception of type "WMFTimeoutException"

This is on a trivial page (two revisions, one editor); possibly this is caused by actor changes?

Well at least I know it isn't just me. Thank you for that. Just for minor troubleshooting notes I tried it on one revision, multiple revisions, text revisions, image revisions, small pages, large pages. Nothing seems to work. Always times out.

Yup, triggered on GET to https://commons.wikimedia.org/w/index.php?title=User%3AJdforrester_%28WMF%29%2Fsandbox&action=revisiondelete&type=revision&ids%5B348296966%5D=1:

[ XMpRwApAMFQAAE8PXE0AAAAQ ] 2019-05-02 02:12:33: Fatal exception of type "WMFTimeoutException"

This is on a trivial page (two revisions, one editor); possibly this is caused by actor changes?

The stack trace associated supports that its db related:

#0 /srv/mediawiki/php-1.34.0-wmf.3/includes/exception/MWExceptionHandler.php(196): {closure}(integer) #1 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array) #2 /srv/mediawiki/php-1.34.0-wmf.3/includes/libs/rdbms/database/DatabaseMysqli.php(46): mysqli->query(string) #3 /srv/mediawiki/php-1.34.0-wmf.3/includes/libs/rdbms/database/Database.php(1322): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string) #4 /srv/mediawiki/php-1.34.0-wmf.3/includes/libs/rdbms/database/Database.php(1224): Wikimedia\Rdbms\Database->attemptQuery(string, string, boolean, string) #5 /srv/mediawiki/php-1.34.0-wmf.3/includes/libs/rdbms/database/Database.php(1784): Wikimedia\Rdbms\Database->query(string, string) #6 /srv/mediawiki/php-1.34.0-wmf.3/includes/pager/IndexPager.php(411): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array) #7 /srv/mediawiki/php-1.34.0-wmf.3/includes/pager/IndexPager.php(256): IndexPager->reallyDoQuery(string, integer, boolean) #8 /srv/mediawiki/php-1.34.0-wmf.3/includes/logging/LogPager.php(450): IndexPager->doQuery() #9 /srv/mediawiki/php-1.34.0-wmf.3/includes/pager/IndexPager.php(467): LogPager->doQuery() #10 /srv/mediawiki/php-1.34.0-wmf.3/includes/logging/LogEventsList.php(689): IndexPager->getBody() #11 /srv/mediawiki/php-1.34.0-wmf.3/includes/specials/SpecialRevisionDelete.php(225): LogEventsList::showLogExtract(OutputPage, string, Title, string, array) #12 /srv/mediawiki/php-1.34.0-wmf.3/includes/specialpage/SpecialPage.php(569): SpecialRevisionDelete->execute(string) #13 /srv/mediawiki/php-1.34.0-wmf.3/includes/actions/SpecialPageAction.php(78): SpecialPage->run(string) #14 /srv/mediawiki/php-1.34.0-wmf.3/includes/MediaWiki.php(499): SpecialPageAction->show() #15 /srv/mediawiki/php-1.34.0-wmf.3/includes/MediaWiki.php(294): MediaWiki->performAction(Article, Title) #16 /srv/mediawiki/php-1.34.0-wmf.3/includes/MediaWiki.php(865): MediaWiki->performRequest() #17 /srv/mediawiki/php-1.34.0-wmf.3/includes/MediaWiki.php(515): MediaWiki->main() #18 /srv/mediawiki/php-1.34.0-wmf.3/index.php(42): MediaWiki->run() #19 /srv/mediawiki/w/index.php(3): require(string) #20 {main}

Note security did some minor adjustments to the revdel process on tuesday but nothing that should cause this.

Any way we can confirm that so we can open up this task? If this is not a security issue, would be good to get more eyes on this since it's a train blocker.

@thcipriani - these are the two security patches that were deployed on Tuesday: T222036#5142596, T222038#5142604 (though not the -formatter patch.) These should only affect granular view permissions for certain revdel logs.

@thcipriani - these are the two security patches that were deployed on Tuesday: T222036#5142596, T222038#5142604 (though not the -formatter patch.) These should only affect granular view permissions for certain revdel logs.

Would an easy way to verify be:

  1. remove the security patch
  2. pull to an mwdebug server
  3. verify the problem still exists using X-Wikimedia-Debug

Does that seem sane? Maybe this is entirely unneeded if there's no way these patches could be the cause.

Plan would need someone with deleterevision (which I don't have, afaik) + someone to fiddle with scap/deploys (which I can help with as needed).

I imagine it's more likely related to the on-going log issues on Commons being exposed by the actor migration T221458 was the latest. On loading action=revisiondelete the log is queried for matching hits.

I'd be a little shocked if these two patches were causing the problem, especially since this seems to be intermittent/only affecting commons. Though I can't say it's impossible. We could revert the patches on wmf.3 and test commonswiki on mwdebug as you suggest, though I definitely don't have sufficient rights there to test.

It's caused by the fix for T221458: Special:Log on commons -- entire web request took longer than 60 seconds and timed out forcing MariaDB to use a bad query plan for the query here. Nothing to do with T222036 or T222038. I have a patch ready to upload for review as soon as we make this non-private.

Given that this task is "principal remedy against common attack vector doesn't work right now", I worry about making this task public before it's fixed in production.

Given that this task is "principal remedy against common attack vector doesn't work right now", I worry about making this task public before it's fixed in production.

Ok then.

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".

Change 507850 had a related patch set uploaded (by Jforrester; owner: Anomie):
[mediawiki/core@master] SECURITY: LogPager: Don't STRAIGHT_JOIN when using log_search

https://gerrit.wikimedia.org/r/507850

Change 507851 had a related patch set uploaded (by SBassett; owner: Anomie):
[mediawiki/core@wmf/1.34.0-wmf.3] SECURITY: LogPager: Don't STRAIGHT_JOIN when using log_search

https://gerrit.wikimedia.org/r/507851

Change 507850 merged by jenkins-bot:
[mediawiki/core@master] SECURITY: LogPager: Don't STRAIGHT_JOIN when using log_search

https://gerrit.wikimedia.org/r/507850

Change 507851 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.3] SECURITY: LogPager: Don't STRAIGHT_JOIN when using log_search

https://gerrit.wikimedia.org/r/507851