Page MenuHomePhabricator

Incorrect rev_sha1 and rev_len on some revisions
Closed, ResolvedPublic

Description

Rarely, some revisions seem to have rev_sha1 and/or rev_len derived from the parent revision instead of the revision itself.

When the revision has only a 'main' slot, rev_sha1 should match content_sha1 and rev_len should match content_size; this allows affected revisions to be easily found. Finding them for revisions on Commons that have a mediainfo slot may be more difficult.

I found 49 such revisions on enwiki in 2019. The "h1" and "h2" columns reflect the hypothesis that the bug is in use of the parent revision's data rather than some less specific corruption.

wikiadmin@10.64.48.13(enwiki)> select rev_id, rev_timestamp, c.content_sha1, rev_sha1, p.content_sha1 as "parent_sha1", rev_sha1=c.content_sha1 or rev_sha1=p.content_sha1 as h1, c.content_size, rev_len, p.content_size as "parent_size", rev_len=c.content_size or rev_len=p.content_size as h2 from revision join slots as cs on (slot_revision_id=rev_id) join content as c on (content_id=slot_content_id) left join slots as ps on(ps.slot_revision_id = rev_parent_id) left join content as p on(p.content_id=ps.slot_content_id) where rev_timestamp like '2019%' and (rev_sha1!=c.content_sha1 or rev_len!=c.content_size) order by rev_timestamp asc;
+-----------+----------------+---------------------------------+---------------------------------+---------------------------------+------+--------------+---------+-------------+------+
| rev_id    | rev_timestamp  | content_sha1                    | rev_sha1                        | parent_sha1                     | h1   | content_size | rev_len | parent_size | h2   |
+-----------+----------------+---------------------------------+---------------------------------+---------------------------------+------+--------------+---------+-------------+------+
| 878065208 | 20190112195110 | gs63e5lmwdr3r6bcmakqm3splx2dotk | gs63e5lmwdr3r6bcmakqm3splx2dotk | pt97gntma90ay2f918ec67ta2306a86 |    1 |         1284 |     283 |         283 |    1 |
| 880278781 | 20190126143001 | 69dax4okof0aqqbgr0xx4tgf96097sl | 69dax4okof0aqqbgr0xx4tgf96097sl | 3ld8wi5h0agnp6d2wjul33gslfgyk5n |    1 |       249548 |  248528 |      248528 |    1 |
| 880316669 | 20190126192223 | ivva4s7jb2wwwgbfjdga93sd180vswe | ivva4s7jb2wwwgbfjdga93sd180vswe | pkx9tqkv0ch3lrwykuk09zy7x3anzwg |    1 |       159837 |  158817 |      158817 |    1 |
| 881691109 | 20190204045427 | t5mqa3jq0uqljo03vuttoovfb98r0mc | t5mqa3jq0uqljo03vuttoovfb98r0mc | mngvoamqyneafy4v6fhcxxfryjzup97 |    1 |       135571 |  134381 |      134381 |    1 |
| 882203781 | 20190207141836 | nln6jxs7zj2v1xlmat1yjkc4u037eb9 | nln6jxs7zj2v1xlmat1yjkc4u037eb9 | kdnrvumo6pr7ym6s2i5aabwwp53fm88 |    1 |       136110 |  134653 |      134653 |    1 |
| 884446740 | 20190221180820 | iaildmbosvkjt28c3d0w7w0lnyz5qbr | iaildmbosvkjt28c3d0w7w0lnyz5qbr | prk0gj1y4y0be8zyofocmbw0lthp0nr |    1 |       868927 |  867905 |      867905 |    1 |
| 885238763 | 20190226205814 | eoep1y6b7wy5mtb93mul90erm1t4a6i | eoep1y6b7wy5mtb93mul90erm1t4a6i | a3mvcafynaia75e1q3gbsp2uj2fakbf |    1 |       102494 |  100715 |      100715 |    1 |
| 885707174 | 20190301194626 | 1kph6xlbrpenv5arqhny919swsyt6hd | 1kph6xlbrpenv5arqhny919swsyt6hd | 74wi4g649hwllf7a54yx4adgensd0uu |    1 |       769095 |  767941 |      767941 |    1 |
| 885854102 | 20190302194031 | fvfd50jxu4szy452s64q3ska2t935k8 | fvfd50jxu4szy452s64q3ska2t935k8 | 74uxabf6wf9nwtkgvlo687p9dt0l1uv |    1 |        88306 |   86738 |       86738 |    1 |
| 886675374 | 20190307195325 | 9ano2c6slaax2rcjyidl843rt1ld2c0 | 9ano2c6slaax2rcjyidl843rt1ld2c0 | hutjdc7pv8036fr46iqkdoc0wgl9uv7 |    1 |        98640 |   97104 |       97104 |    1 |
| 887350302 | 20190312023019 | 84kq2usqtgou6vp0h1326nlh2ceih9o | 84kq2usqtgou6vp0h1326nlh2ceih9o | 3m6049rljb0caaq8sr5469xsrwrzmtv |    1 |       133170 |  132154 |      132154 |    1 |
| 888198383 | 20190317155853 | 56ozincizoce1txc7aji3nxcjjvwxjd | 56ozincizoce1txc7aji3nxcjjvwxjd | fwe6ammfhs8ffq0eq2jtfj1liorq2bh |    1 |         1810 |     798 |         798 |    1 |
| 891994409 | 20190411140939 | iueaz79drf5jh74txhwnh4pcgbbptg3 | iueaz79drf5jh74txhwnh4pcgbbptg3 | k6ot6gv73k4r4htc7sf1dcedzw17oak |    1 |       268082 |  267071 |      267071 |    1 |
| 893149264 | 20190419101712 | hdrh2qtf0v84714esvj7n08qftuhn66 | hdrh2qtf0v84714esvj7n08qftuhn66 | h07b6tjdh66l1qqbvr762rluzmjyukp |    1 |       157779 |  156768 |      156768 |    1 |
| 894046563 | 20190425081530 | q24ep55i0s7zr2syr5pj0117yjifp00 | q24ep55i0s7zr2syr5pj0117yjifp00 | bryfuomtcwoddy8do69cz4bbabptjbp |    1 |       399760 |  398750 |      398750 |    1 |
| 896116760 | 20190508120152 | k241qz91f27uslwklfu2tdexkzmm5f9 | k241qz91f27uslwklfu2tdexkzmm5f9 | kszqovdqvpcefbsw57i7ajfa0jv895f |    1 |       153299 |  152369 |      152369 |    1 |
| 899683728 | 20190531170232 | sxghdhe1pyiqt4ou0j6pbqc8v9mwqsd | sxghdhe1pyiqt4ou0j6pbqc8v9mwqsd | scd5qc9ju91rhrf0npzhzvwiexzri2n |    1 |       130199 |  129010 |      129010 |    1 |
| 900287348 | 20190604170540 | cwun44t3ar6d0883t5nlqif56sevy4p | cwun44t3ar6d0883t5nlqif56sevy4p | jiytkvx10m3qhiy5c3mhpthghmffle5 |    1 |      1102421 | 1101321 |     1101321 |    1 |
| 907678319 | 20190724143618 | 6vfrdwrpegtgp8gf1uojjya3hasai2k | 6vfrdwrpegtgp8gf1uojjya3hasai2k | sa6yx5f1eimfpx5h1o6mzkaut4u6lhg |    1 |        42233 |   40216 |       40216 |    1 |
| 908579513 | 20190730162959 | jact05uz88b2buqj8v2g8slj6t3ev5r | oq5m5s5zdco74a4e9q2bfxu379d0hxo | oq5m5s5zdco74a4e9q2bfxu379d0hxo |    1 |       110493 |  109316 |      109316 |    1 |
| 908579605 | 20190730163040 | trmnobpn0xi54zdcblj3rabzgp75r2p | jact05uz88b2buqj8v2g8slj6t3ev5r | jact05uz88b2buqj8v2g8slj6t3ev5r |    1 |       110492 |  110493 |      110493 |    1 |
| 909769741 | 20190807140250 | 4kk862f2cpy0dwcj6d1mk904uc9hawx | 9l9emwdbg06u3uxao6855m2zxgsz242 | 9l9emwdbg06u3uxao6855m2zxgsz242 |    1 |       144826 |  143797 |      143797 |    1 |
| 910369588 | 20190811161646 | j2rynutv4c4gwgrp3vzvh60dy6ef2j0 | h15qf4r5nm63iynq42lzn2c33ml1lzz | h15qf4r5nm63iynq42lzn2c33ml1lzz |    1 |        47026 |   45809 |       45809 |    1 |
| 910667477 | 20190813164503 | n6ge458p9wfzhjsrjotp3n3pc8l11zm | r71bdzldoi3ecsxvzo0q65jav86bgbz | r71bdzldoi3ecsxvzo0q65jav86bgbz |    1 |       117016 |  115998 |      115998 |    1 |
| 910810302 | 20190814161501 | dl2ux65zrxcqpcvldgzb4fo29ojyxqf | trfu19olf2fpg69pbjleiezjdrjkw93 | trfu19olf2fpg69pbjleiezjdrjkw93 |    1 |       538044 |  537026 |      537026 |    1 |
| 910841318 | 20190814204033 | s1w6l1v4rqxbeepovo9j0b3voiaqsgq | 6rgvdai1krj9vdipr802uyojz08nfvv | 6rgvdai1krj9vdipr802uyojz08nfvv |    1 |       118627 |  117426 |      117426 |    1 |
| 914059363 | 20190904222026 | c0326zjkxpihx207uu6jxrqzkz8p3a3 | rqu5ssib0imxai8q6figlnmyhso6zmw | rqu5ssib0imxai8q6figlnmyhso6zmw |    1 |        59146 |   57940 |       57940 |    1 |
| 914143304 | 20190905130827 | o2x0deb4jb75jkta7w15qom9ru1xt3o | 24d1ohqeoffietvozlnt9uqjxijhcca | 24d1ohqeoffietvozlnt9uqjxijhcca |    1 |        99330 |   98310 |       98310 |    1 |
| 917568240 | 20190924122520 | 477xnqbzla1ntbzfkx5c8p2gtg06q2o | gmqd2ze6hg01h1bde3knvvlbzctkivv | gmqd2ze6hg01h1bde3knvvlbzctkivv |    1 |       205449 |  204388 |      204388 |    1 |
| 918024535 | 20190926165957 | 3jm6ivh8tmu1wa44hurmrla0ulo8iaf | heokqho43hw3k5xcwuoowdyuooy8yhu | heokqho43hw3k5xcwuoowdyuooy8yhu |    1 |       106948 |  105931 |      105931 |    1 |
| 921476910 | 20191015232601 | fstn29cxm5x8bsszylfukag9blxsdp3 | bw9lfj7c2c6966c95alpidrvbm6wiel | bw9lfj7c2c6966c95alpidrvbm6wiel |    1 |       836962 |  835937 |      835937 |    1 |
| 921476985 | 20191015232634 | 658v4ayup36j6d78pst2udt5e7mcs5a | fstn29cxm5x8bsszylfukag9blxsdp3 | fstn29cxm5x8bsszylfukag9blxsdp3 |    1 |       837987 |  836962 |      836962 |    1 |
| 922120198 | 20191020022108 | jnh4feyz7gt2arybq5ip6pvoeicpvna | 27gmmfihnl8jpjicwx2kdp8uhdyjifq | 27gmmfihnl8jpjicwx2kdp8uhdyjifq |    1 |         3651 |    3651 |        3358 |    1 |
| 922434213 | 20191022031337 | 9e9d7n84f6i30d93odkx3kn8bzivqm2 | k8xgdvbv9wtxcp6g9b42ec3e0wrn0ai | k8xgdvbv9wtxcp6g9b42ec3e0wrn0ai |    1 |         2132 |    2132 |        1820 |    1 |
| 923519434 | 20191029013917 | 8dp0677fiuf1ovhyl7rb3h1qg7tw63w | jcgyuczygobkqvn2hk8f76oqd93djuk | jcgyuczygobkqvn2hk8f76oqd93djuk |    1 |       500705 |  500705 |      166859 |    1 |
| 923758684 | 20191030161741 | 6sa9llkpqydznu4w8hncc3rnf55c5m8 | iw2gtid35ytmb0vdw55u7eghrteiq7f | iw2gtid35ytmb0vdw55u7eghrteiq7f |    1 |       178870 |  176496 |      176496 |    1 |
| 924070125 | 20191101162527 | sv5akjpo3qdm98k3ofxezc882jfow0e | 1hp406lv00tx8x3lj9pewm5lk6hs5r2 | 1hp406lv00tx8x3lj9pewm5lk6hs5r2 |    1 |         9769 |    9769 |        4877 |    1 |
| 924329064 | 20191103050233 | gx025z9e0cz44qa9l3xirvaugr7is0c | dl1xls9wngylhokkrchq7teabj42j5h | dl1xls9wngylhokkrchq7teabj42j5h |    1 |       664355 |  664355 |      132869 |    1 |
| 924329202 | 20191103050348 | t8mnlr6h55hdrmj7g2lx1u7d3nscur4 | gx025z9e0cz44qa9l3xirvaugr7is0c | gx025z9e0cz44qa9l3xirvaugr7is0c |    1 |      2657523 | 2657523 |      664355 |    1 |
| 924329471 | 20191103050703 | e5hjned41vu4ix6yyd1p1xygcct14aw | akz4epfeme0yng9vq8obyrz03ou8i08 | akz4epfeme0yng9vq8obyrz03ou8i08 |    1 |       624760 |  624760 |      104125 |    1 |
| 924329537 | 20191103050736 | exlxmedxh66mqouala9w5sgpsbul898 | e5hjned41vu4ix6yyd1p1xygcct14aw | e5hjned41vu4ix6yyd1p1xygcct14aw |    1 |      2499141 | 2499141 |      624760 |    1 |
| 924338026 | 20191103064749 | 0gtk1w7017fbb61xj1crd9tiw5kv4an | q2h5nush5gole2id0bpwt2vlrudohrg | q2h5nush5gole2id0bpwt2vlrudohrg |    1 |        95213 |   95213 |       19039 |    1 |
| 924338085 | 20191103064824 | n0q5ict733si79d6tklhfqoetn4s8dv | 0gtk1w7017fbb61xj1crd9tiw5kv4an | 0gtk1w7017fbb61xj1crd9tiw5kv4an |    1 |      1142566 | 1142566 |       95213 |    1 |
| 924338116 | 20191103064859 | 5zcv4m13s9fbitevwiade39qk7rd9u3 | n0q5ict733si79d6tklhfqoetn4s8dv | n0q5ict733si79d6tklhfqoetn4s8dv |    1 |      2285146 | 2285146 |     1142566 |    1 |
| 924343130 | 20191103075551 | pyt4ndpwuwtdsqavbidh4ufowo6v4gf | kldmuy57hlns0ai197wifzaa95iud5y | kldmuy57hlns0ai197wifzaa95iud5y |    1 |         3505 |    3505 |        3291 |    1 |
| 926706706 | 20191118050344 | pmy4isrwj77tngncui8ch3zlhp39uvh | gd4ogk7u669s84ixlb9l9pxmgpklirp | gd4ogk7u669s84ixlb9l9pxmgpklirp |    1 |        57450 |   57450 |       50470 |    1 |
| 926707018 | 20191118050757 | nik1bugm5avnwmtmt21gm1qva8urex3 | gldvvttxdhgxt9i4un0frvg92xdu932 | gldvvttxdhgxt9i4un0frvg92xdu932 |    1 |        37979 |   37979 |       44713 |    1 |
| 927216163 | 20191121020935 | 1row3ey1m79v6h2yeqwhgdxd5mt073l | c0q6wp1hrbeg9bw3a30v4mh98hytqxp | c0q6wp1hrbeg9bw3a30v4mh98hytqxp |    1 |       103094 |  103094 |      100133 |    1 |
| 928622565 | 20191130152646 | jm2o43w5c4ok6lm54xzxf05ymoclea7 | 5p3q5kzdgll0akxrw9i7tzm9fs5hccw | 5p3q5kzdgll0akxrw9i7tzm9fs5hccw |    1 |       184778 |  183755 |      183755 |    1 |
+-----------+----------------+---------------------------------+---------------------------------+---------------------------------+------+--------------+---------+-------------+------+
49 rows in set (11 min 1.75 sec)

This was originally reported at https://en.wikipedia.org/wiki/Wikipedia:Help_desk#Weird_bytecount_in_page_history and https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Weird_bytecount_in_page_history

Event Timeline

Anomie created this task.Dec 3 2019, 3:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 3 2019, 3:03 PM
daniel awarded a token.Dec 4 2019, 7:48 PM
daniel added a project: User-Daniel.
WDoranWMF triaged this task as High priority.Dec 4 2019, 7:51 PM
daniel claimed this task.Dec 12 2019, 1:32 PM

Change 562264 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] RevisionStore: fail on mismatching hash or size.

https://gerrit.wikimedia.org/r/562264

Note: the above patch doesn't fix the root cause, but should prevent any new bad revisions from being created, and provide us with a stack trace that allows us to investigate the issue further.

Change 562264 merged by jenkins-bot:
[mediawiki/core@master] RevisionStore: fail on mismatching hash or size.

https://gerrit.wikimedia.org/r/562264

Pchelolo added a subscriber: Pchelolo.

The logging from the patch above has reached production and we've collected a few logs: https://logstash.wikimedia.org/goto/73c24ffcc3d5cad824f08d0f6904d21d

Example:
Message: Precondition failed: The revisions's SHA1 hash must match the main slot's SHA1 hash (see T239717)
ReqID: XlX94QpAIDMAAF4jX9MAAAAP
Trace:

#0 /srv/mediawiki/php-1.35.0-wmf.20/includes/Revision/RevisionStore.php(487): Wikimedia\Assert\Assert::precondition(boolean, string)
#1 /srv/mediawiki/php-1.35.0-wmf.20/includes/Storage/PageUpdater.php(996): MediaWiki\Revision\RevisionStore->insertRevisionOn(MediaWiki\Revision\MutableRevisionRecord, Wikimedia\Rdbms\DBConnRef)
#2 /srv/mediawiki/php-1.35.0-wmf.20/includes/Storage/PageUpdater.php(766): MediaWiki\Storage\PageUpdater->doModify(CommentStoreComment, User, integer)
#3 /srv/mediawiki/php-1.35.0-wmf.20/includes/page/WikiPage.php(1942): MediaWiki\Storage\PageUpdater->saveRevision(CommentStoreComment, integer)
#4 /srv/mediawiki/php-1.35.0-wmf.20/includes/EditPage.php(2377): WikiPage->doEditContent(WikitextContent, CommentStoreComment, integer, boolean, User, string, array, integer)
#5 /srv/mediawiki/php-1.35.0-wmf.20/includes/EditPage.php(1649): EditPage->internalAttemptSave(array, boolean)
#6 /srv/mediawiki/php-1.35.0-wmf.20/includes/api/ApiEditPage.php(400): EditPage->attemptSave(array)
#7 /srv/mediawiki/php-1.35.0-wmf.20/includes/api/ApiMain.php(1586): ApiEditPage->execute()
#8 /srv/mediawiki/php-1.35.0-wmf.20/includes/api/ApiMain.php(522): ApiMain->executeAction()
#9 /srv/mediawiki/php-1.35.0-wmf.20/includes/api/ApiMain.php(493): ApiMain->executeActionWithErrorHandling()
#10 /srv/mediawiki/php-1.35.0-wmf.20/api.php(84): ApiMain->execute()
#11 /srv/mediawiki/w/api.php(3): require(string)
#12 {main}

The stack trace above is from the API, but this also seems to happen via the UI:

#0 /srv/mediawiki/php-1.35.0-wmf.27/includes/Revision/RevisionStore.php(384): Wikimedia\Assert\Assert::precondition(boolean, string)
#1 /srv/mediawiki/php-1.35.0-wmf.27/includes/Storage/PageUpdater.php(1006): MediaWiki\Revision\RevisionStore->insertRevisionOn(MediaWiki\Revision\MutableRevisionRecord, Wikimedia\Rdbms\DBConnRef)
#2 /srv/mediawiki/php-1.35.0-wmf.27/includes/Storage/PageUpdater.php(776): MediaWiki\Storage\PageUpdater->doModify(CommentStoreComment, User, integer)
#3 /srv/mediawiki/php-1.35.0-wmf.27/includes/page/WikiPage.php(1929): MediaWiki\Storage\PageUpdater->saveRevision(CommentStoreComment, integer)
#4 /srv/mediawiki/php-1.35.0-wmf.27/includes/EditPage.php(2412): WikiPage->doEditContent(WikitextContent, CommentStoreComment, integer, boolean, User, string, array, integer)
#5 /srv/mediawiki/php-1.35.0-wmf.27/includes/EditPage.php(1678): EditPage->internalAttemptSave(array, boolean)
#6 /srv/mediawiki/php-1.35.0-wmf.27/includes/EditPage.php(706): EditPage->attemptSave(array)
#7 /srv/mediawiki/php-1.35.0-wmf.27/includes/actions/EditAction.php(60): EditPage->edit()
#8 /srv/mediawiki/php-1.35.0-wmf.27/includes/actions/SubmitAction.php(38): EditAction->show()
#9 /srv/mediawiki/php-1.35.0-wmf.27/includes/MediaWiki.php(519): SubmitAction->show()
#10 /srv/mediawiki/php-1.35.0-wmf.27/includes/MediaWiki.php(305): MediaWiki->performAction(Article, Title)
#11 /srv/mediawiki/php-1.35.0-wmf.27/includes/MediaWiki.php(973): MediaWiki->performRequest()
#12 /srv/mediawiki/php-1.35.0-wmf.27/includes/MediaWiki.php(535): MediaWiki->main()
#13 /srv/mediawiki/php-1.35.0-wmf.27/index.php(47): MediaWiki->run()
#14 /srv/mediawiki/w/index.php(3): require(string)
#15 {main}

It seems to be rare (less than one per day), but persistent. This points to a race condition. The evidence for this isn't clear, though:

In the past seven days, the error ocurred several times on https://en.wikipedia.org/w/index.php?title=User:CrazyBoy826, which was seeing many edits in rapid succession. This is consistent with the race condition theory.

However, the error also ocurred on https://es.wikipedia.org/w/index.php?title=Plantilla:Aviso_promocional which had no (successfull) edit since 2018. That implies something other than a race condition.

My current theory is some confusion involving WikiPage::prepareContentForEdit and/or stash edit and/or, triggered via the use of a parser functionor pre-save transform. Something involving {{subst:REVISIONID}} or something similarly magical.

daniel added a comment.EditedApr 16 2020, 10:26 AM

More thoughts: DerivedPageDataUpdater calls MutableRevisionRecord::newFromParentRevision( $parentRevision ), which creates a MutableRevisionRecord that has the same content as the parent, including hash and size.

If getSize() or getSha1() is called on that MutableRevisionRecord, the value will be cached. Calling setSlot() or setContent() on MutableRevisionRecord will reset the cached size and hash. But the slots can also be changed by manipulating the MutableRevisionSlots instance returned by MutableRevisionRecord::getSlots(). This way, content and size/hash could get out of whack.

For instance, DerivedPageDataUpdater calls $pstContentSlots->setSlot( $pstSlot ), where $pstContentSlots comes from $this->revision->getSlots(). That would do it, though I can't see a way for getSize() or getHash() being called at that point. Given the number of hooks and callbacks involved here, it may however be possible. In particular, PST calls the parser, which may call back to extension code that may in turn get access to the prepared revision object.

Even though I'm not entirely clear on how the inconsistency is triggered, I propose to introduce a callback into MutableRevisionSlots, so it can trigger MutableRevisionRecord::resetAggregateValues() whenever the slots change.

Change 589279 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] MutableRevisionRecord: ensure consistent hash and size

https://gerrit.wikimedia.org/r/589279

Change 589279 merged by jenkins-bot:
[mediawiki/core@master] MutableRevisionRecord: ensure consistent hash and size

https://gerrit.wikimedia.org/r/589279

daniel added a comment.EditedApr 20 2020, 5:37 PM

The patch is merged, but let's keep this open for a bit and see if the error actually goes away after this has been deployed.

brennen moved this task from Backlog to Logs/Train on the User-brennen board.
Pchelolo closed this task as Resolved.May 28 2020, 5:24 PM

Verified that the error has gone away. Resolving.

Aklapper removed a subscriber: Anomie.Fri, Oct 16, 5:40 PM