Page MenuHomePhabricator

stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!
Closed, ResolvedPublic0 Story Points

Description

This looks to be the same spot in Wikibase code as T217329 except that the stubs are broken instead of the abstracts, and it's the production wiki rather than the test wiki. Link to revision that breaks things: https://www.wikidata.org/w/index.php?title=Q4672180&oldid=410362640

Event Timeline

ArielGlenn triaged this task as High priority.Mon, Jul 22, 5:53 AM
ArielGlenn created this task.

Excerpts from a test run, and more info about the revision:

dumpsgen@snapshot1006:/srv/deployment/dumps/dumps/xmldumps-backup$ /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=wikidatawiki --full --stub --report=1 --output=file:/mnt/dumpsdata/temp/dumpsgen/stubs-history.xml  --start=4463823 --end 4463824
2019-07-22 05:06:50: wikidatawiki (ID 59526) 0 pages (0.0|0.0/sec all|curr), 1 revs (6.7|6.7/sec all|curr), ETA 2024-03-25 13:51:00 [max 983290293]
2019-07-22 05:06:50: wikidatawiki (ID 59526) 0 pages (0.0|0.0/sec all|curr), 2 revs (13.3|1207.7/sec all|curr), ETA 2021-11-27 02:33:51 [max 983290293]
...
2019-07-22 05:06:50: wikidatawiki (ID 59526) 0 pages (0.0|0.0/sec all|curr), 40 revs (198.6|844.6/sec all|curr), ETA 2019-09-17 12:31:56 [max 983290293]
2019-07-22 05:06:50: wikidatawiki (ID 59526) 0 pages (0.0|0.0/sec all|curr), 41 revs (200.6|338.7/sec all|curr), ETA 2019-09-16 22:39:24 [max 983290293]
MWContentSerializationException from line 299 of /srv/mediawiki/php-1.34.0-wmf.14/extensions/Wikibase/lib/includes/Store/EntityContentDataCodec.php: $entityId and $targetId can not be the same.
#0 /srv/mediawiki/php-1.34.0-wmf.14/extensions/Wikibase/repo/includes/Content/EntityHandler.php(383): Wikibase\Lib\Store\EntityContentDataCodec->decodeRedirect('{"entity":"Q467...', NULL)
#1 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionStore.php(1475): Wikibase\Repo\Content\EntityHandler->unserializeContent('{"entity":"Q467...', NULL)
#2 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionStore.php(1673): MediaWiki\Revision\RevisionStore->loadSlotContent(Object(MediaWiki\Revision\SlotRecord), NULL, NULL, NULL, 0)
#3 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}(Object(MediaWiki\Revision\SlotRecord))
#4 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/SlotRecord.php(307): call_user_func(Object(Closure), Object(MediaWiki\Revision\SlotRecord))
#5 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\SlotRecord->getContent()
#6 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php(362): MediaWiki\Revision\RevisionRecord->getContent('main', 3)
#7 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(530): XmlDumpWriter->writeRevision(Object(stdClass), Array)
#8 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(473): WikiExporter->outputPageStreamBatch(Object(Wikimedia\Rdbms\ResultWrapper), Object(stdClass))
#9 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(287): WikiExporter->dumpPages('page_id >= 4463...', false)
#10 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(172): WikiExporter->dumpFrom('page_id >= 4463...', false)
#11 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/includes/BackupDumper.php(289): WikiExporter->pagesByRange(4463823, 4463824, false)
#12 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/dumpBackup.php(82): BackupDumper->dump(1, 1)
#13 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/doMaintenance.php(99): DumpBackup->execute()
#14 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/dumpBackup.php(144): require_once('/srv/mediawiki/...')
#15 /srv/mediawiki/multiversion/MWScript.php(101): require_once('/srv/mediawiki/...')
#16 {main}
InvalidArgumentException from line 41 of /srv/mediawiki/php-1.34.0-wmf.14/vendor/wikibase/data-model/src/Entity/EntityRedirect.php: $entityId and $targetId can not be the same.
InvalidArgumentException from line 41 of /srv/mediawiki/php-1.34.0-wmf.14/vendor/wikibase/data-model/src/Entity/EntityRedirect.php: $entityId and $targetId can not be the same.
#0 /srv/mediawiki/php-1.34.0-wmf.14/extensions/Wikibase/lib/includes/Store/EntityContentDataCodec.php(296): Wikibase\DataModel\Entity\EntityRedirect->__construct(Object(Wikibase\DataModel\Entity\ItemId), Object(Wikibase\DataModel\Entity\ItemId))
#1 /srv/mediawiki/php-1.34.0-wmf.14/extensions/Wikibase/repo/includes/Content/EntityHandler.php(383): Wikibase\Lib\Store\EntityContentDataCodec->decodeRedirect('{"entity":"Q467...', NULL)
#2 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionStore.php(1475): Wikibase\Repo\Content\EntityHandler->unserializeContent('{"entity":"Q467...', NULL)
#3 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionStore.php(1673): MediaWiki\Revision\RevisionStore->loadSlotContent(Object(MediaWiki\Revision\SlotRecord), NULL, NULL, NULL, 0)
#4 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}(Object(MediaWiki\Revision\SlotRecord))
#5 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/SlotRecord.php(307): call_user_func(Object(Closure), Object(MediaWiki\Revision\SlotRecord))
#6 /srv/mediawiki/php-1.34.0-wmf.14/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\SlotRecord->getContent()
#7 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php(362): MediaWiki\Revision\RevisionRecord->getContent('main', 3)
#8 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(530): XmlDumpWriter->writeRevision(Object(stdClass), Array)
#9 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(473): WikiExporter->outputPageStreamBatch(Object(Wikimedia\Rdbms\ResultWrapper), Object(stdClass))
#10 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(287): WikiExporter->dumpPages('page_id >= 4463...', false)
#11 /srv/mediawiki/php-1.34.0-wmf.14/includes/export/WikiExporter.php(172): WikiExporter->dumpFrom('page_id >= 4463...', false)
#12 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/includes/BackupDumper.php(289): WikiExporter->pagesByRange(4463823, 4463824, false)
#13 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/dumpBackup.php(82): BackupDumper->dump(1, 1)
#14 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/doMaintenance.php(99): DumpBackup->execute()
#15 /srv/mediawiki/php-1.34.0-wmf.14/maintenance/dumpBackup.php(144): require_once('/srv/mediawiki/...')
#16 /srv/mediawiki/multiversion/MWScript.php(101): require_once('/srv/mediawiki/...')
#17 {main}

Excerpt from the stubs file with the last good revision:

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/
 http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="en">
  <siteinfo>
    <sitename>Wikidata</sitename>
    <dbname>wikidatawiki</dbname>
    <base>https://www.wikidata.org/wiki/Wikidata:Main_Page</base>
    <generator>MediaWiki 1.34.0-wmf.14</generator>
...
  <page>
    <title>Q4672180</title>
    <ns>0</ns>
    <id>4463823</id>
    <revision>
      <id>6599279</id>
      <timestamp>2013-02-15T14:32:54Z</timestamp>
      <contributor>
        <username>Sk!dbot</username>
        <id>4341</id>
      </contributor>
      <comment>/* wbeditentity-create */ Bot: adding interwikilink: eo, en</comment>
      <model>wikibase-item</model>
      <format>application/json</format>
      <text xml:space="preserve" bytes="278" id="6599106" />
      <sha1>b91vm35gr4wd7ymbxpof98bhvij22kx</sha1>
    </revision>
...
    <revision>
      <id>410362636</id>
      <parentid>404101446</parentid>
      <timestamp>2016-11-22T16:57:13Z</timestamp>
      <contributor>
        <username>XXN-bot</username>
        <id>403421</id>
      </contributor>
      <comment>/* wbeditentity-override:0| */ clearing item to prepare for redirect</comment>
      <model>wikibase-item</model>
      <format>application/json</format>
      <text xml:space="preserve" bytes="158" id="412681981" />
      <sha1>tjo3gtxd3kcdvlk40uf0gkktovqh9bb</sha1>
    </revision>

The questionable revision id is 410362640, the content id is 409061491, and the text id is 412681986, with address DB://cluster24/207519399

MariaDB [wikidatawiki]> select TO_BASE64(blob_text) from blobs_cluster24 where blob_id  = 207519399;
+--------------------------------------------------+
| TO_BASE64(blob_text)                             |
+--------------------------------------------------+
| q1ZKzSvJLKlUslIKNDEzNzK0MFDSUSpKTcksSk0uQRatBQA= |
+--------------------------------------------------+
1 row in set (0.00 sec)

Manually base64 decoding and deflating this shows:

('decoded: ', u'{"entity":"Q4672180","redirect":"Q4672180"}')
ArielGlenn added a subscriber: daniel.EditedMon, Jul 22, 7:10 AM

Stubs should not be loading content; this appears to be a problem introduced by https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/464768/ at line 360 of the new XmlDumpWriter.php:

$text = $rev->getContent( SlotRecord::MAIN, RevisionRecord::RAW );

Adding @daniel to ask about this. I guess it should be in some sort of conditional, like

$text = '';
if ( $contentMode === self::WRITE_CONTENT ) {
  $text = $rev->getContent( SlotRecord::MAIN, RevisionRecord::RAW );
}
ArielGlenn renamed this task from stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self to stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!.Mon, Jul 22, 7:17 AM

One side effect of this, besides wikidata stubs being broken, is that the stubs are running extremely slowly, given that the text content is being requested from external store for each slot (one at a time). For example, enwiki stub generation is not even through the first third of the files. We won't complete the run on time unless this is fixed quickly.

Change 524723 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[mediawiki/core@master] don't load revision text content unless requested to

https://gerrit.wikimedia.org/r/524723

The above patch has been tested in beta on a tiny wiki and all jobs ran properly. It has been tested with the dump command in T228614#5352340 on snapshot1008 and the revisions for the page were properly rendered.

ArielGlenn moved this task from Backlog to Active on the Dumps-Generation board.Mon, Jul 22, 11:13 AM

Change 524723 merged by jenkins-bot:
[mediawiki/core@master] don't load revision text content unless requested to

https://gerrit.wikimedia.org/r/524723

Change 524760 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[mediawiki/core@wmf/1.34.0-wmf.14] don't load revision text content unless requested to

https://gerrit.wikimedia.org/r/524760

Change 524760 merged by jenkins-bot:
[mediawiki/core@wmf/1.34.0-wmf.14] don't load revision text content unless requested to

https://gerrit.wikimedia.org/r/524760

Mentioned in SAL (#wikimedia-operations) [2019-07-22T15:24:08Z] <jforrester@deploy1001> Synchronized php-1.34.0-wmf.14/includes/export/XmlDumpWriter.php: T228614 XmlDumpWriter: don't load revision text content unless requested to (duration: 00m 48s)

Thanks to Daniel and James for review and merge. This has been deployed. I verified that the new code runs to completion on the page with the problematic revision, and that Special:Export for pages with revision history produces what we expect. I won't close the ticket just yet, I'd like to watch the run through tomorrow and make sure there are no unexpected consequences.

ArielGlenn closed this task as Resolved.Tue, Jul 30, 10:37 AM
ArielGlenn claimed this task.

The wikis ran to completion, but I forgot to close this. Doing so now!

ArielGlenn moved this task from Active to Done on the Dumps-Generation board.Tue, Jul 30, 10:38 AM