Page MenuHomePhabricator

XmlDUmpWriter::writeRevision sometimes broken by duplicate keys in Link Cache
Open, HighPublic0 Estimated Story Points

Description

This is worked around in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/502150/ (see related T220316) but the underlying issue needs to be fixed.

In the present case, one page is Commons:Deletion_requests/Files_uploaded_by_Nabila.selim in the main namespace (0), the other page is in the Commons namespace with title Deletion_requests/Files_uploaded_by_Nabila.selim, they have close page ids and so are read in the same batch query for xml stubs. Both have the same key in the link cache for lookup.

Unlike the related bug, neither page is a redirect but when the first revision of the second page is to be written, eventually the constructor invokes Title:getArticleID which goes to retrieve the info for the title from the link cache, which is for the other title with its page id, causing an exception again at RevisionStoreRecord.php

InvalidArgumentException from line 100 of /srv/mediawiki_atg/php-1.33.0-wmf.23/includes/Revision/RevisionStoreRecord.php: The given Title does not belong to page ID 38058998 but actually belongs to 38058985

We need either to clean up these bad pages/revisions manually in some fashion or to do something with them so that they don't live at these unreachable titles.

Event Timeline

ArielGlenn created this task.

Abstract dumps are broken by this as well, fix incoming.

Change 502953 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[mediawiki/extensions/ActiveAbstract@master] avoid link cache issues with duplicate title keys for xml dumps

https://gerrit.wikimedia.org/r/502953

one page is Commons:Deletion_requests/Files_uploaded_by_Nabila.selim in the main namespace (0),
...
We need either to clean up these bad pages/revisions manually in some fashion or to do something with them so that they don't live at these unreachable titles.

I would have expected the namespaceDupes.php script to have taken care of this...maybe no one has run it on Commons recently?

Here's another one.

2019-04-13 21:00:06: commonswiki (ID 6588) 7999 pages (38.4|204.2/sec all|curr), 8000 revs (38.4|25.5/sec all|curr), ETA 2019-05-07 08:58:25 [max 78009778]
[a4981f3630af2b579a4a53dc] [no req]   InvalidArgumentException from line 100 of /srv/mediawiki/php-1.33.0-wmf.25/includes/Revision/RevisionStoreRecord.php: The given Title does not belong t
o page ID 56765073 but actually belongs to 78009849
Backtrace:
#0 /srv/mediawiki/php-1.33.0-wmf.25/includes/Revision/RevisionStore.php(1823): MediaWiki\Revision\RevisionStoreRecord->__construct(Title, User, CommentStoreComment, stdClass, MediaWiki\Revi
sion\RevisionSlots, boolean)
#1 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/XmlDumpWriter.php(311): MediaWiki\Revision\RevisionStore->newRevisionFromRow(stdClass, integer, Title)
#2 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/WikiExporter.php(485): XmlDumpWriter->writeRevision(stdClass)
#3 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/WikiExporter.php(445): WikiExporter->outputPageStreamBatch(Wikimedia\Rdbms\ResultWrapper, stdClass)
#4 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/WikiExporter.php(269): WikiExporter->dumpPages(string, boolean)
#5 /srv/mediawiki/php-1.33.0-wmf.25/includes/export/WikiExporter.php(154): WikiExporter->dumpFrom(string, boolean)
#6 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/includes/BackupDumper.php(288): WikiExporter->pagesByRange(integer, integer, boolean)
#7 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/dumpBackup.php(83): BackupDumper->dump(integer, integer)
#8 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/doMaintenance.php(96): DumpBackup->execute()
#9 /srv/mediawiki/php-1.33.0-wmf.25/maintenance/dumpBackup.php(138): require_once(string)
#10 /srv/mediawiki/multiversion/MWScript.php(100): require_once(string)
#11 {main}

Hey @daniel the patchset for this goes together with the patchset for T220793 which you just +2'ed, care to have a look?

Change 502953 merged by jenkins-bot:
[mediawiki/extensions/ActiveAbstract@master] avoid link cache issues with duplicate title keys for xml dumps

https://gerrit.wikimedia.org/r/502953