Page MenuHomePhabricator

Fatal exception undeleting a file on Commons: rev_page field must not be 0!
Closed, ResolvedPublicPRODUCTION ERROR


[X-sXJQpAMNUAAAUESTcAAAAQ] 2021-01-10 15:03:03: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" while undeleting


Request ID
Request URL
Stack Trace
#0 /srv/mediawiki/php-1.36.0-wmf.25/includes/Revision/RevisionStore.php(409): MediaWiki\Revision\RevisionStore->failOnEmpty(integer, string)
#1 /srv/mediawiki/php-1.36.0-wmf.25/includes/page/PageArchive.php(764): MediaWiki\Revision\RevisionStore->insertRevisionOn(MediaWiki\Revision\RevisionArchiveRecord, Wikimedia\Rdbms\MaintainableDBConnRef)
#2 /srv/mediawiki/php-1.36.0-wmf.25/includes/page/PageArchive.php(499): PageArchive->undeleteRevisions(array, boolean, string)
#3 /srv/mediawiki/php-1.36.0-wmf.25/includes/specials/SpecialUndelete.php(1349): PageArchive->undeleteAsUser(array, User, string, array, boolean)
#4 /srv/mediawiki/php-1.36.0-wmf.25/includes/specials/SpecialUndelete.php(324): SpecialUndelete->undelete()
#5 /srv/mediawiki/php-1.36.0-wmf.25/includes/specialpage/SpecialPage.php(645): SpecialUndelete->execute(NULL)
#6 /srv/mediawiki/php-1.36.0-wmf.25/includes/specialpage/SpecialPageFactory.php(1404): SpecialPage->run(NULL)
#7 /srv/mediawiki/php-1.36.0-wmf.25/includes/MediaWiki.php(310): MediaWiki\SpecialPage\SpecialPageFactory->executePath(Title, RequestContext)
#8 /srv/mediawiki/php-1.36.0-wmf.25/includes/MediaWiki.php(945): MediaWiki->performRequest()
#9 /srv/mediawiki/php-1.36.0-wmf.25/includes/MediaWiki.php(548): MediaWiki->main()
#10 /srv/mediawiki/php-1.36.0-wmf.25/index.php(53): MediaWiki->run()
#11 /srv/mediawiki/php-1.36.0-wmf.25/index.php(46): wfIndexMain()
#12 /srv/mediawiki/w/index.php(3): require(string)
#13 {main}

Event Timeline

Again [X-soGJppZrncdRiH5vU4GwAAAE0] 2021-01-10 16:15:25: Fatal exception of type "MediaWiki\Revision\IncompleteRevisionException" while undeleting

Aklapper changed the subtype of this task from "Task" to "Production Error".Jan 10 2021, 4:21 PM
Aklapper renamed this task from Fatal exception to Fatal exception undeleting a file on Commons: rev_page field must not be 0!.Jan 10 2021, 4:24 PM
Aklapper set Request URL to
Aklapper set Request ID to X-soGJppZrncdRiH5vU4GwAAAE0.
Aklapper edited Stack Trace. (Show Details)
Aklapper set Phatality ID to 58f3b8ae221375d4c1fafc0e6de81cc5ca354446222450ed123b54c3c381504f.
AMooney lowered the priority of this task from Unbreak Now! to High.Jan 25 2021, 5:57 PM
AMooney added a subscriber: AMooney.

Changing to High Priority thinking that if it doesn't happen all the time, and the error is fixed by trying again, it shouldn't be a ubn, but a High.

Change 663218 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Add assertions about page IDs during undeletion.

daniel lowered the priority of this task from High to Medium.Feb 10 2021, 3:19 PM
daniel added a subscriber: daniel.

Code review did not reveal an obvious cause. Made a patch that adds assertions, to help with pinpointing the cause.

This still happens during undeletion once every couple of days or so. It's not limited to commons, and it's not limited to file pages.

commonswiki, Feb 8: YCFMYQpAMNEAASNit70AAACK
commonswiki, Feb 7: gpAMM4ABAZkUGkAAABD
enwiki, Feb 4: YBx59gpAICEAAGxibMEAAAAO


Hopefully the assertion will tell us more about what's going on.

Change 663220 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Explicitly convert page ID to int during undeletion

Note that the patches attached so far are not expected to fix the issue.

Change 663218 merged by jenkins-bot:

[mediawiki/core@master] Add assertions about page IDs during undeletion.

				$newid = $article->insertOn( $dbw, $latestRestorableRow->ar_page_id );
				if ( $newid === false ) {
					// The old ID is reserved; let's pick another
					$newid = $article->insertOn( $dbw );

No, the comment is incorrect, that's not why WikiPage::insertOn() returns false. It returns false if the page exists with the specified title. There's no old ID involved at this point. If you call insertOn() again, it will just return false again.

[EDIT] I overstated this a bit -- the comment is potentially correct and the two calls to insertOn() are necessary. I was confused by the term "old ID" to mean the old page_id. It should be something like:

				$isNew = true;
				$newid = $article->insertOn( $dbw, $latestRestorableRow->ar_page_id );
				if ( $newid === false ) {
					// The page ID is reserved; let's pick another
					$newid = $article->insertOn( $dbw );
					if ( $newid === false ) {
						// The page title must be already taken (race condition)
						$isNew = false;

I was able to reproduce the exception with the following patch, which creates an article in a separate request at the appropriate time.

diff --git a/includes/page/PageArchive.php b/includes/page/PageArchive.php
index 23f5c0d2b7..a3116c55c4 100644
--- a/includes/page/PageArchive.php
+++ b/includes/page/PageArchive.php
@@ -690,6 +690,15 @@ class PageArchive {
+		global $IP;
+		MediaWikiServices::getInstance()->getShellCommandFactory()->create()
+			->input( 'test' )
+			->params(
+				'php',
+				"$IP/maintenance/edit.php",
+				$article->getTitle()->getPrefixedDBkey() )
+			->execute();
 		$newid = false; // newly created page ID
 		$restored = 0; // number of revisions restored
 		/** @var RevisionRecord|null $revision */

My idea is to rearrange PageArchive::undeleteRevisions() so that the $article->insertOn() can be done speculatively, i.e. without first checking whether the page exists. This locks the row, and the return value tells you how to set $makepage.

It's a big function. Currently we have:

  • Page data loading, lines 544-581
  • Archive table loading and validation, lines 583-691
  • Page creation and revdel validation, lines 693-733
  • Revision insertion, lines 735-760
  • Archive deletion, lines 762-772
  • Updates related to changing page_latest, lines 784-835

Maybe these major phases could be split up into separate private methods. But in terms of the current layout, here are the changes I'm suggesting:

  • Move the page data loading down to after the page row insertion, somewhere around line 734.
  • Move revdel validation (i.e. the undeleterevdel error) down to after page data loading. The current logic boils down to: if we are updating the current revision, it must not have DELETED_TEXT. We need to load the current revision before we can do this. Rollback (i.e. cancelAtomic()) if revdel validation fails.
  • We could add a low-rigor revdel validation check using DB_REPLICA before the startAtomic() call, to reduce the chance of a rollback.
  • We don't need two methods of getting the pre-undelete current revision timestamp (on lines 564 and 790). Pick a winner.
  • I'm not convinced that an invalid page_latest should be fatal (undeleterevision-missing). It's derived data and we could just fix it. Delete and undelete seem like good opportunities to normalize stored data. This is an argument for the first method of extracting page_latest since fixing it is the kind of thing a storage layer does, and direct queries make sense in a storage class. In any case it's not appropriate to ignore the fact that RevisionStore::getTimestampFromId() may return false.

Change 692706 had a related patch set uploaded (by Cicalese; author: Cicalese):

[mediawiki/core@master] WIP: Refactor PageArchive::undeleteRevisions

Change 692706 merged by jenkins-bot:

[mediawiki/core@master] Rearrange PageArchive::undeleteRevisions to avoid race condition

Change 693196 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/core@master] Cleanup/refactor of the PageArchive::undeleteRevisions() function

Change 663220 abandoned by Daniel Kinzler:

[mediawiki/core@master] Explicitly convert page ID to int during undeletion


BPirkle added a subscriber: BPirkle.

Removing myself as assignee, as I am not actively working on this.

Krinkle added a subscriber: Krinkle.

No matches on Logstash/mediawiki dashboard in the last 30 days for channel:"exception" AND exception.trace:*PageArchive* AND NOT message:"max_statement_time"

Krinkle reassigned this task from Krinkle to tstarling.