Page MenuHomePhabricator

dumpBackups.php failing with InvalidArgumentException thrown from RevisionStoreRecord for certain wikis
Closed, ResolvedPublic0 Estimated Story PointsPRODUCTION ERROR

Description

I've narrowed one case down to the problematic sequence of titles:

dumpsgen@snapshot1009:/mnt/dumpsdata/temp/dumpsgen$ /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=enwiki --full --stub --report=1 --output=file:/mnt/dumpsdata/temp/dumpsgen/badstubs-history.xml  --skip-header --start=4169926 --skip-footer --end 4169928
2019-04-06 08:09:13: enwiki (ID 256699) 0 pages (0.0|0.0/sec all|curr), 1 revs (25.5|25.5/sec all|curr), ETA 2020-05-15 09:18:22 [max 891190890]
2019-04-06 08:09:13: enwiki (ID 256699) 0 pages (0.0|0.0/sec all|curr), 2 revs (49.6|915.8/sec all|curr), ETA 2019-10-31 11:53:18 [max 891190890]
[13802068f4547aefc7dabdb6] [no req]   InvalidArgumentException from line 100 of /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/RevisionStoreRecord.php: The given Title does not belong to page ID 4169927 but actually belongs to 4169926
Backtrace:
#0 /srv/mediawiki/php-1.33.0-wmf.23/includes/Revision/RevisionStore.php(1820): MediaWiki\Revision\RevisionStoreRecord->__construct(Title, User, CommentStoreComment, stdClass, MediaWiki\Revision\RevisionSlots, boolean)
#1 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/XmlDumpWriter.php(327): MediaWiki\Revision\RevisionStore->newRevisionFromRow(stdClass, integer, Title)
#2 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(485): XmlDumpWriter->writeRevision(stdClass)
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(445): WikiExporter->outputPageStreamBatch(Wikimedia\Rdbms\ResultWrapper, stdClass)
#4 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(269): WikiExporter->dumpPages(string, boolean)
#5 /srv/mediawiki/php-1.33.0-wmf.23/includes/export/WikiExporter.php(154): WikiExporter->dumpFrom(string, boolean)
#6 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/includes/BackupDumper.php(288): WikiExporter->pagesByRange(integer, integer, boolean)
#7 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/dumpBackup.php(81): BackupDumper->dump(integer, integer)
#8 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/doMaintenance.php(94): DumpBackup->execute()
#9 /srv/mediawiki/php-1.33.0-wmf.23/maintenance/dumpBackup.php(138): require_once(string)
#10 /srv/mediawiki/multiversion/MWScript.php(100): require_once(string)
#11 {main}

The underlying issue has always been there but the exception causing stubs on some projects to fail is new.

Event Timeline

ArielGlenn created this task.

The command in the task description should dump two pages, in order: 2.4 and 2.40 on enwiki.

If either page is dumped by itself, the script completes just fine.

After some scrying around, I tracked it down to this:

WikiExporter.php
        protected function outputPageStreamBatch( $results, $lastRow ) {
                foreach ( $results as $row ) {
                        if ( $lastRow === null ||
                                $lastRow->page_namespace != $row->page_namespace ||
                                $lastRow->page_title != $row->page_title ) {

See that last comparison? Those page titles are both strings, sure, but php does the sloppy compare for us, thinks the rows for both these pages belong to the same page (with title 2.4) and later mw bombs out from that.

Verified that this behavior (treating revisions of '2.40' as belonging to '2.4') existed earlier; all revisions of title '2.40' are included in the page '2.4' in the March 1 2019 stubs dump for enwiki, and no entry is present for page '2.40'; these are old pages and revisions from 2007. In fact in the Nov 2010 copy of the en wiki stubs dump, the same issue is present, never discovered until now.

I have live-patched WikiExporter.php on snapshot1009 and am running the dumpBackup command there as it would be run for stubs production (though to a different output file) :

/usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=enwiki --full --stub --report=1000 --output=file:/mnt/dumpsdata/temp/dumpsgen/badstubs-history.xml --output=file:/mnt/dumpsdata/temp/dumpsgen/badstubs-current.xml --filter=latest --output=file:/mnt/dumpsdata/temp/dumpsgen/badstubs-articles.xml --filter=latest --filter=notalk '--filter=namespace:!NS_USER' --skip-header --start=4166862 --skip-footer --end 4186862

Change 501869 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[mediawiki/core@master] for exports, make sure we compare page titles as strings only

https://gerrit.wikimedia.org/r/501869

The script live-patched ran to completion. I will put this fix out on snapshot1005,6,7 for .wmf23 so that we can get stubs completed. The fix will be overwritten by the first deploy though.

Change 501869 merged by jenkins-bot:
[mediawiki/core@master] for exports, make sure we compare page titles as strings only

https://gerrit.wikimedia.org/r/501869

Change 502537 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[mediawiki/core@wmf/1.33.0-wmf.24] for exports, make sure we compare page titles as strings only

https://gerrit.wikimedia.org/r/502537

Change 502537 merged by jenkins-bot:
[mediawiki/core@wmf/1.33.0-wmf.24] for exports, make sure we compare page titles as strings only

https://gerrit.wikimedia.org/r/502537

ArielGlenn claimed this task.

While there is probably more that can be done with the dump and export scripts to harden them against exceptions from MW, this specific case has been addressed, fixes are deployed, the error is gone. Closing.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM