Page MenuHomePhabricator

includes/Revision/RevisionStore.php: Main slot of revision (number) not found in database!
Open, HighPublicPRODUCTION ERROR

Description

Noticed in mediawiki-errors for both wmf.8 an wmf.9

[XBuw4QpAICMAALTXkHkAAABG] /w/index.php?title=Speci%C3%A1lis:Lap_%C3%A1tnevez%C3%A9se&action=submit   MediaWiki\Revision\RevisionAccessException from line 1643 of /srv/mediawiki/php-1.33.0-wmf.8/includes/Revision/RevisionStore.php: Main slot of revision 20798212 not found in database!
[XBuxngpAIDgAAJFITF8AAADO] /w/api.php?rvprop=userid%7Cuser%7Cids%7Ccontent%7Csize%7Ctimestamp%7Ccontentmodel%7Ccomment&revids=815943409&prop=revisions&format=json&rvslots=main&action=query   MediaWiki\Revision\RevisionAccessException from line 1643 of /srv/mediawiki/php-1.33.0-wmf.9/includes/Revision/RevisionStore.php: Main slot of revision 815943409 not found in database!

Impact

Through the API, querying information about some pages results in an internal_api_error response.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
AMooney changed the task status from Open to Stalled.Mar 12 2020, 1:21 PM

Is there a work around to fix this? In our instance the Main_Page was edited.

Is there a work around to fix this? In our instance the Main_Page was edited.

The error as reported here should be transient and fix itself after a few seconds. If you are seeing this permanently, that would indicate that your database somehow got corrupted. A manual fix would be to set page_latest to an earlier revision that doesn't have the error.

Is there a work around to fix this? In our instance the Main_Page was edited.

The error as reported here should be transient and fix itself after a few seconds. If you are seeing this permanently, that would indicate that your database somehow got corrupted. A manual fix would be to set page_latest to an earlier revision that doesn't have the error.

Thanks Daniel. That strategy worked. I made an update to the page table of the database and set the page_latest = 1. All is good now.

Still seen.

Request ID XpiAgQpAMNMAApcc1VkAAAAV
RevisionAccessException: Main slot of revision 951319475 not found in database!

#0 /srv/mediawiki/php-1.35.0-wmf.27/includes/Revision/RevisionStore.php(1188): MediaWiki\Revision\RevisionStore->constructSlotRecords(string, Wikimedia\Rdbms\ResultWrapper, integer, Title)
#1 /srv/mediawiki/php-1.35.0-wmf.27/includes/Revision/RevisionStore.php(1290): MediaWiki\Revision\RevisionStore->loadSlotRecords(string, integer, Title)
#2 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#3 /srv/mediawiki/php-1.35.0-wmf.27/includes/Revision/RevisionSlots.php(171): call_user_func(Closure)
#4 /srv/mediawiki/php-1.35.0-wmf.27/includes/Revision/RevisionRenderer.php(209): MediaWiki\Revision\RevisionSlots->getSlots()
#5 /srv/mediawiki/php-1.35.0-wmf.27/includes/Revision/RevisionRenderer.php(152): MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, array)
#6 [internal function]: MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
#7 /srv/mediawiki/php-1.35.0-wmf.27/includes/Revision/RenderedRevision.php(198): call_user_func(Closure, MediaWiki\Revision\RenderedRevision, array)
#8 /srv/mediawiki/php-1.35.0-wmf.27/includes/jobqueue/jobs/CategoryMembershipChangeJob.php(281): MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#9 /srv/mediawiki/php-1.35.0-wmf.27/includes/jobqueue/jobs/CategoryMembershipChangeJob.php(255): CategoryMembershipChangeJob->getCategoriesAtRev(WikiPage, MediaWiki\Revision\RevisionStoreRecord, string)
#10 /srv/mediawiki/php-1.35.0-wmf.27/includes/jobqueue/jobs/CategoryMembershipChangeJob.php(212): CategoryMembershipChangeJob->getExplicitCategoriesChanges(WikiPage, MediaWiki\Revision\RevisionStoreRecord, MediaWiki\Revision\RevisionStoreRecord)
#11 /srv/mediawiki/php-1.35.0-wmf.27/includes/jobqueue/jobs/CategoryMembershipChangeJob.php(176): CategoryMembershipChangeJob->notifyUpdatesForRevision(Wikimedia\Rdbms\LBFactoryMulti, WikiPage, MediaWiki\Revision\RevisionStoreRecord)
#12 /srv/mediawiki/php-1.35.0-wmf.27/extensions/EventBus/includes/JobExecutor.php(71): CategoryMembershipChangeJob->run()
#13 /srv/mediawiki/rpc/RunSingleJob.php(76): JobExecutor->execute(array)
#14 {main}
hashar added a subscriber: hashar.May 7 2020, 1:32 PM

That still show from CategoryMembershipChangeJob.

brennen moved this task from Backlog to Logs/Train on the User-brennen board.

Change 595147 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] RevisionStore: fall back to master db if main slot is missing.

https://gerrit.wikimedia.org/r/595147

Change 595147 merged by jenkins-bot:
[mediawiki/core@master] RevisionStore: fall back to master db if main slot is missing.

https://gerrit.wikimedia.org/r/595147

Change 595596 had a related patch set uploaded (by Jforrester; owner: Daniel Kinzler):
[mediawiki/core@wmf/1.35.0-wmf.31] RevisionStore: fall back to master db if main slot is missing.

https://gerrit.wikimedia.org/r/595596

Change 595596 merged by jenkins-bot:
[mediawiki/core@wmf/1.35.0-wmf.31] RevisionStore: fall back to master db if main slot is missing.

https://gerrit.wikimedia.org/r/595596

Mentioned in SAL (#wikimedia-operations) [2020-05-11T18:11:51Z] <jforrester@deploy1001> Synchronized php-1.35.0-wmf.31/includes/Revision/RevisionStore.php: T252156 T212428 RevisionStore: fall back to master db if main slot is missing (duration: 00m 58s)

This problem is difficult to filter out of logstash messages (line numbers tend to change from week-to-week); however, doesn't seem to cause a great deal user-facing impact: would it be possible to catch this particular exception in the code and provide a more useful warning that could be logged that we could subsequently filter in logstash?

I seem to still see these in the 1.35.0-wmf.36 version of MediaWiki. They don't seem to be bad enough to warrant a train blocker or an Unbreak Now status, but having them filtered away in code, or transpogrified to be more useful or at least more easily filterable would be appreciated by train conductors on all contintents.

Hrm, this is still assigned to me, even after being worked on by others and after being iceboxed at one point.

Looking at a recent trace:

Request ID AXK92JLSK6TSR36G_ZXR
#0 /srv/mediawiki/php-1.35.0-wmf.36/includes/Revision/RevisionStore.php(1274): MediaWiki\Revision\RevisionStore->constructSlotRecords(string, Wikimedia\Rdbms\ResultWrapper, integer, Title, NULL)
#1 /srv/mediawiki/php-1.35.0-wmf.36/includes/Revision/RevisionStore.php(1201): MediaWiki\Revision\RevisionStore->constructSlotRecords(string, Wikimedia\Rdbms\ResultWrapper, integer, Title)
#2 /srv/mediawiki/php-1.35.0-wmf.36/includes/Revision/RevisionStore.php(1318): MediaWiki\Revision\RevisionStore->loadSlotRecords(string, integer, Title)
#3 [internal function]: MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
#4 /srv/mediawiki/php-1.35.0-wmf.36/includes/Revision/RevisionSlots.php(171): call_user_func(Closure)
#5 /srv/mediawiki/php-1.35.0-wmf.36/includes/Revision/RevisionSlots.php(113): MediaWiki\Revision\RevisionSlots->getSlots()
#6 /srv/mediawiki/php-1.35.0-wmf.36/includes/Revision/RevisionRecord.php(192): MediaWiki\Revision\RevisionSlots->getSlot(string)
#7 /srv/mediawiki/php-1.35.0-wmf.36/includes/Revision/RevisionRecord.php(175): MediaWiki\Revision\RevisionRecord->getSlot(string, integer, NULL)
#8 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3625): MediaWiki\Revision\RevisionRecord->getContent(string)
#9 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3520): Parser->statelessFetchTemplate(Title, Parser)
#10 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3393): Parser->fetchTemplateAndTitle(Title)
#11 /srv/mediawiki/php-1.35.0-wmf.36/extensions/LabeledSectionTransclusion/includes/LabeledSectionTransclusion.php(227): Parser->getTemplateDom(Title)
#12 /srv/mediawiki/php-1.35.0-wmf.36/extensions/LabeledSectionTransclusion/includes/LabeledSectionTransclusion.php(300): LabeledSectionTransclusion::setupPfunc12(Parser, PPTemplateFrame_Hash, array, string)
#13 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3334): LabeledSectionTransclusion::pfuncIncludeObj(Parser, PPTemplateFrame_Hash, array)
#14 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3041): Parser->callParserFunction(PPTemplateFrame_Hash, string, array)
#15 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPFrame_Hash.php(253): Parser->braceSubstitution(array, PPTemplateFrame_Hash)
#16 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3219): PPFrame_Hash->expand(PPNode_Hash_Tree)
#17 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPFrame_Hash.php(253): Parser->braceSubstitution(array, PPFrame_Hash)
#18 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(2958): PPFrame_Hash->expand(PPNode_Hash_Tree)
#19 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPFrame_Hash.php(253): Parser->braceSubstitution(array, PPFrame_Hash)
#20 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(2958): PPFrame_Hash->expand(PPNode_Hash_Tree)
#21 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPFrame_Hash.php(253): Parser->braceSubstitution(array, PPFrame_Hash)
#22 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPTemplateFrame_Hash.php(152): PPFrame_Hash->expand(PPNode_Hash_Tree, integer)
#23 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPTemplateFrame_Hash.php(179): PPTemplateFrame_Hash->getNumberedArgument(string)
#24 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3802): PPTemplateFrame_Hash->getArgument(string)
#25 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPFrame_Hash.php(270): Parser->argSubstitution(array, PPTemplateFrame_Hash)
#26 /srv/mediawiki/php-1.35.0-wmf.36/extensions/ParserFunctions/includes/ParserFunctions.php(123): PPFrame_Hash->expand(PPNode_Hash_Tree)
#27 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3334): MediaWiki\Extensions\ParserFunctions\ParserFunctions::if(Parser, PPTemplateFrame_Hash, array)
#28 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3041): Parser->callParserFunction(PPTemplateFrame_Hash, string, array)
#29 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPFrame_Hash.php(253): Parser->braceSubstitution(array, PPTemplateFrame_Hash)
#30 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(3219): PPFrame_Hash->expand(PPNode_Hash_Tree)
#31 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/PPFrame_Hash.php(253): Parser->braceSubstitution(array, PPFrame_Hash)
#32 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(2882): PPFrame_Hash->expand(PPNode_Hash_Tree, integer)
#33 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(1553): Parser->replaceVariables(string)
#34 /srv/mediawiki/php-1.35.0-wmf.36/includes/parser/Parser.php(648): Parser->internalParse(string)
#35 /srv/mediawiki/php-1.35.0-wmf.36/includes/cache/MessageCache.php(1301): Parser->parse(string, Title, ParserOptions, boolean)
#36 /srv/mediawiki/php-1.35.0-wmf.36/extensions/PageTriage/includes/ArticleCompile/ArticleCompileSnippet.php(59): MessageCache->parse(string)
#37 /srv/mediawiki/php-1.35.0-wmf.36/extensions/PageTriage/includes/ArticleCompile/ArticleCompileSnippet.php(20): MediaWiki\Extension\PageTriage\ArticleCompile\ArticleCompileSnippet::generateArticleSnippet(string)
#38 /srv/mediawiki/php-1.35.0-wmf.36/extensions/PageTriage/includes/ArticleCompile/ArticleCompileProcessor.php(266): MediaWiki\Extension\PageTriage\ArticleCompile\ArticleCompileSnippet->compile()
#39 /srv/mediawiki/php-1.35.0-wmf.36/extensions/PageTriage/includes/ArticleCompile/ArticleCompileProcessor.php(194): MediaWiki\Extension\PageTriage\ArticleCompile\ArticleCompileProcessor->process()
#40 /srv/mediawiki/php-1.35.0-wmf.36/extensions/PageTriage/includes/Hooks.php(204): MediaWiki\Extension\PageTriage\ArticleCompile\ArticleCompileProcessor->compileMetadata()
#41 /srv/mediawiki/php-1.35.0-wmf.36/includes/deferred/MWCallableUpdate.php(38): MediaWiki\Extension\PageTriage\Hooks::MediaWiki\Extension\PageTriage\{closure}()
#42 /srv/mediawiki/php-1.35.0-wmf.36/includes/deferred/DeferredUpdates.php(466): MWCallableUpdate->doUpdate()
#43 /srv/mediawiki/php-1.35.0-wmf.36/includes/deferred/DeferredUpdates.php(343): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#44 /srv/mediawiki/php-1.35.0-wmf.36/includes/deferred/DeferredUpdates.php(287): DeferredUpdates::run(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
#45 /srv/mediawiki/php-1.35.0-wmf.36/includes/deferred/DeferredUpdates.php(194): DeferredUpdates::handleUpdateQueue(array, string, integer)
#46 /srv/mediawiki/php-1.35.0-wmf.36/includes/MediaWiki.php(1072): DeferredUpdates::doUpdates(string)
#47 /srv/mediawiki/php-1.35.0-wmf.36/includes/MediaWiki.php(849): MediaWiki->restInPeace()
#48 /srv/mediawiki/php-1.35.0-wmf.36/includes/MediaWiki.php(861): MediaWiki->{closure}()
#49 /srv/mediawiki/php-1.35.0-wmf.36/includes/MediaWiki.php(582): MediaWiki->doPostOutputShutdown()
#50 /srv/mediawiki/php-1.35.0-wmf.36/index.php(47): MediaWiki->run()
#51 /srv/mediawiki/w/index.php(3): require(string)
#52 {main}

This one appears to have tried the fallback to master introduced in https://gerrit.wikimedia.org/r/595147 and still failed.

This occurs very deep in the revison-related code, which is used by - well - pretty much everything. So modifying all callers to catch this and handle it individually doesn't seem feasible. Changing our fundamental assumptions to expect that revisions might (even temporarily) not have a main slot seems like a Big Deal.

From a quick logstash review, I only saw this triggered on the code path through loadSlotRecords: https://gerrit.wikimedia.org/g/mediawiki/core/+/1e6fa3e186d7b33fa1c0587740057f0c7ecd5721/includes/Revision/RevisionStore.php#1201 . But if we caught it there, we'd just have to rethrow it, because the higher-level caller legitimately expects to receive (at least a) main slot record. So we can't let callers continue to execute. I don't see a straightforward way around throwing an exception from deep within RevisionStore.

Regarding changing how this is reported or making this easier to filter, I'm happy to help but an example would be useful. How have we handled logging of similar exceptions in the past? What could we do to make this easier for the (very much appreciated) train conductors?

Regarding changing how this is reported or making this easier to filter, I'm happy to help but an example would be useful. How have we handled logging of similar exceptions in the past? What could we do to make this easier for the (very much appreciated) train conductors?

How about a log call to a custom channel for the revision ID and a generic exception message for error logs? The latter would be much easier for deployers to filter / ignore.

this just happened on lij.wikisource, trying to import https://wikisource.org/wiki/Index:Bacigal%C3%B4.Prose.rim%C3%A6.1891.pdf (by file). If that helps with anything?

Error occurs with Special:Import, too, for import both from wiki and file.

Error occurs with Special:Import, too, for import both from wiki and file.

Can you provide back stack trace or an error ID? Does it happen every time, or sporadically? Does it happen for any page, or only specific pages?

I tried to import https://de.wikipedia.org/wiki/R%C3%B6merstra%C3%9Fe_Neckar-Alb-Aare (w:de:Römerstraße Neckar-Alb-Aare) from German Wikipedia to German Wikivoyage. I got only the error message "Import fehlgeschlagen (= import failed): Main slot of revision not found in database. See T212428." but nothing else. This failure happened all the time with this page, but we did not tested it with other pages.

daniel added a comment.EditedJul 23 2020, 1:23 PM

I found some log entries that look like they are related to this issue. Here is the relevant info:

_id: AXN7hz9h3_NNwgAUthTp
mwversion: 1.36.0-wmf.1
reqId: 30d52c41-ee69-4991-a1ba-40ceed794137
message: MediaWiki\Revision\RevisionStore::constructSlotRecords falling back to READ_LATEST.
trace:<ul>
<li>RevisionStore.php line 1274 calls wfBacktrace()</li>
<li>RevisionStore.php line 1208 calls MediaWiki\Revision\RevisionStore->constructSlotRecords()</li>
<li>RevisionStore.php line 1330 calls MediaWiki\Revision\RevisionStore->loadSlotRecords()</li>
<li>- line - calls MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()</li>
<li>RevisionSlots.php line 175 calls call_user_func()</li>
<li>RevisionSlots.php line 134 calls MediaWiki\Revision\RevisionSlots->getSlots()</li>
<li>RevisionRecord.php line 209 calls MediaWiki\Revision\RevisionSlots->hasSlot()</li>
<li>ImportableOldRevisionImporter.php line 165 calls MediaWiki\Revision\RevisionRecord->hasSlot()</li>
<li>WikiRevision.php line 669 calls ImportableOldRevisionImporter->import()</li>
<li>WikiImporter.php line 371 calls WikiRevision->importOldRevision()</li>
<li>WikiImporter.php line 507 calls WikiImporter->importRevision()</li>
<li>WikiImporter.php line 999 calls WikiImporter->revisionCallback()</li>
<li>WikiImporter.php line 864 calls WikiImporter->processRevision()</li>
<li>WikiImporter.php line 802 calls WikiImporter->handleRevision()</li>
<li>WikiImporter.php line 612 calls WikiImporter->handlePage()</li>
<li>SpecialImport.php line 235 calls WikiImporter->doImport()</li>
<li>SpecialImport.php line 118 calls SpecialImport->doImport()</li>
<li>SpecialPage.php line 600 calls SpecialImport->execute()</li>
<li>SpecialPageFactory.php line 635 calls SpecialPage->run()</li>
<li>MediaWiki.php line 307 calls MediaWiki\SpecialPage\SpecialPageFactory->executePath()</li>
<li>MediaWiki.php line 940 calls MediaWiki->performRequest()</li>
<li>MediaWiki.php line 543 calls MediaWiki->main()</li>
<li>index.php line 53 calls MediaWiki->run()</li>
<li>index.php line 46 calls wfIndexMain()</li>
<li>index.php line 3 calls require()</li>
</ul>

From another log entry:
_id: AXN7hzzDNoG2jwpwShrO
message: MediaWiki\Revision\RevisionStore::constructSlotRecords: Main slot of revision 1316093 not found in database. See T212428.
mwversion: 1.36.0-wmf.1
reqId: 30d52c41-ee69-4991-a1ba-40ceed794137
revid: 1316093

From the database table:
$ select * from slots join slot_roles on role_id = slot_role_id join content on content_id = slot_content_id where slot_revision_id = 1316093;

slot_revision_idslot_role_idslot_content_idslot_originrole_idrole_namecontent_idcontent_sizecontent_sha1content_modelcontent_address
13160931128652813160931main128652831979zx2cqynjxesuh5835hhv6637x7kca81tt:1295133

So the main slot is there.

I assume the revision was just created right before we again try to read it. So I can see how it would be missing from a replica, but
the fact that it's not found even after falling back to the master database is a mystery to me. Maybe it's due to transaction behavior? Perhaps a DBA has an idea...

Note that the importer code was changed in a way that is likely to aggravate this issue: previously, we were looking on at the data in the revision table, now we are also looking at data in the slots table (T220525). This however makes this more of a mystery. If the problem was that we are seeing old data in the database query, why are we getting an up to date view of the revision table, but a stale view of the slots table (or the content table, which is joined in)?

The only thing I could think of is that _all_ slaves were lagging a bit (which would be really weird) and due to semi-sync replication the master didn't commit the transaction until at least one replica reported the transaction as committed. But I think the odds of that happening would be very low.

Looking at other log messages from the same request, it seems that replication lag was high at the time:

_id: AXN7hz7_MQ_08tQavmlc
message: Wikimedia\Rdbms\LoadBalancer::waitForMasterPos: timed out waiting on 10.64.16.7 pos 0-171966669-4075108480,171966669-171966669-4196523483,171974792-171974792-378345284,171978787-171978787-1768880104,180363367-180363367-134174373 [3.2E-5s]

Could be related.

I just realized that the fallback to master happens in constructSlotRecords(), and only applies to loading slot blobs. If we already failed to fetch the slot rows, this will not help. We'd need the same kind of logic in loadSlotRecords()! So that explains why the fallback doesn't help.

Looking at other log messages from the same request, it seems that replication lag was high at the time:

_id: AXN7hz7_MQ_08tQavmlc
message: Wikimedia\Rdbms\LoadBalancer::waitForMasterPos: timed out waiting on 10.64.16.7 pos 0-171966669-4075108480,171966669-171966669-4196523483,171974792-171974792-378345284,171978787-171978787-1768880104,180363367-180363367-134174373 [3.2E-5s]

Could be related.

That could be the false positive we get logged on MW quite often
I doubt all the slaves were lagging but happy to double check! do you have a timestamp so I can check all of them? Which wiki was this?

Change 615753 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] RevisionSTore: fall back to master if no revision rows were found

https://gerrit.wikimedia.org/r/615753

That could be the false positive we get logged on MW quite often
I doubt all the slaves were lagging but happy to double check! do you have a timestamp so I can check all of them? Which wiki was this?

Timestamp: 2020-07-23T11:55:26
Wiki: dewikivoyage
_id: AXN7hz7_MQ_08tQavmlc
reqId: 30d52c41-ee69-4991-a1ba-40ceed794137

Xover added a subscriber: Xover.Jul 23 2020, 3:49 PM

Note that T258666 looks like it could be an expression of this problem that was exacerbated by MediaWiki 1.36/wmf.1 (which is, I think, scheduled to hit the Wikipedias today). And as I commented there: I've never before seen this problem, and I've now seen 3 reports from 3 different projects in the last 24 hours. If so, this is not just a log spamming problem with the occasional user-visible weirdness any more.

daniel changed the task status from Stalled to Open.Jul 23 2020, 4:26 PM

One way this can happen is the issue I just fixed for T258666:

Revision rows are selected from the master DB, and then RevisionStore::newRevisionFromRow is used to create a RevisionObject.
In that situation, newRevisionFromRow() needs to be passed the READ_LATEST flag, so associated information in the slots and content tables is also read from the master database.
Otherwise, it will default to reading from a replica, which may not yet have the revision we found on the master db.

I didn't see any other instances of this in core, but we should watch out for this issue when analyzing stack traces related to this.

MF-Warburg added a comment.EditedJul 23 2020, 11:38 PM

A new case just happened on lij.wikisource at https://lij.wikisource.org/w/index.php?title=Laude_zeneize/19&action=history - Special:Import showed the error, but the page was still imported (though without a log entry).

Edit: Or rather, several cases - I was importing a file with pages numbered /1 to /25. It happened for every page after 18.

That could be the false positive we get logged on MW quite often
I doubt all the slaves were lagging but happy to double check! do you have a timestamp so I can check all of them? Which wiki was this?

Timestamp: 2020-07-23T11:55:26
Wiki: dewikivoyage
_id: AXN7hz7_MQ_08tQavmlc
reqId: 30d52c41-ee69-4991-a1ba-40ceed794137

No lag reported around those times

hashar removed a subscriber: hashar.Jul 27 2020, 10:21 AM

Change 615753 merged by jenkins-bot:
[mediawiki/core@master] RevisionStore: fall back to master if no revision rows were found.

https://gerrit.wikimedia.org/r/615753

Change 616608 had a related patch set uploaded (by Cicalese; owner: Daniel Kinzler):
[mediawiki/core@REL1_35] RevisionStore: fall back to master if no revision rows were found.

https://gerrit.wikimedia.org/r/616608

Change 616608 merged by jenkins-bot:
[mediawiki/core@REL1_35] RevisionStore: fall back to master if no revision rows were found.

https://gerrit.wikimedia.org/r/616608

BPirkle removed BPirkle as the assignee of this task.Aug 18 2020, 8:14 PM
BPirkle added a subscriber: BPirkle.

Removing myself as assignee as I am not actively working on this. I'm happy to help, but others have been more active on this task.

daniel raised the priority of this task from Medium to High.Aug 26 2020, 10:04 AM

So this is still ongoing, but not very high frequency. I'm seeing about 100 incidents over the last week:
https://logstash.wikimedia.org/goto/89f0103da0f86a76470a3003a684f2d0

Bumping to "high", since we should at least understand when and why this is happening.

daniel added a comment.EditedSep 7 2020, 8:48 PM

Here's a logstash link for all RevisionStore errors: https://logstash.wikimedia.org/goto/57ff2c50e5c662f862b897e08720f3ee

The instances I find now all come from JobRunner, and go via Scribunto:

 	RevisionStore.php line 1322 calls wfBacktrace()
RevisionStore.php line 1254 calls MediaWiki\Revision\RevisionStore->constructSlotRecords()
RevisionStore.php line 1250 calls MediaWiki\Revision\RevisionStore->loadSlotRecords()
RevisionStore.php line 1365 calls MediaWiki\Revision\RevisionStore->loadSlotRecords()
- line - calls MediaWiki\Revision\RevisionStore->MediaWiki\Revision\{closure}()
RevisionSlots.php line 175 calls call_user_func()
RevisionSlots.php line 117 calls MediaWiki\Revision\RevisionSlots->getSlots()
RevisionRecord.php line 192 calls MediaWiki\Revision\RevisionSlots->getSlot()
RevisionRecord.php line 175 calls MediaWiki\Revision\RevisionRecord->getSlot()
Parser.php line 3641 calls MediaWiki\Revision\RevisionRecord->getContent()
Parser.php line 3526 calls Parser->statelessFetchTemplate()
ScribuntoEngineBase.php line 167 calls Parser->fetchTemplateAndTitle()
LuaEngine.php line 566 calls ScribuntoEngineBase->fetchModuleFromParser()
LuaSandboxCallback.php line 26 calls Scribunto_LuaEngine->loadPackage()
- line - calls Scribunto_LuaSandboxCallback->__call()
LuaSandboxInterpreter.php line 113 calls LuaSandboxFunction->call()
LuaEngine.php line 266 calls Scribunto_LuaSandboxInterpreter->callFunction()
LuaModule.php line 55 calls Scribunto_LuaEngine->executeModule()
Hooks.php line 128 calls Scribunto_LuaModule->invoke()
Parser.php line 3335 calls ScribuntoHooks::invokeHook()
Parser.php line 3042 calls Parser->callParserFunction()
PPFrame_Hash.php line 253 calls Parser->braceSubstitution()
Parser.php line 3220 calls PPFrame_Hash->expand()
PPFrame_Hash.php line 253 calls Parser->braceSubstitution()
Parser.php line 2882 calls PPFrame_Hash->expand()
Parser.php line 1555 calls Parser->replaceVariables()
Parser.php line 650 calls Parser->internalParse()
WikitextContent.php line 374 calls Parser->parse()
AbstractContent.php line 590 calls WikitextContent->fillParserOutput()
RenderedRevision.php line 266 calls AbstractContent->getParserOutput()
RenderedRevision.php line 235 calls MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached()
RevisionRenderer.php line 215 calls MediaWiki\Revision\RenderedRevision->getSlotParserOutput()
RevisionRenderer.php line 152 calls MediaWiki\Revision\RevisionRenderer->combineSlotOutput()
- line - calls MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}()
RenderedRevision.php line 197 calls call_user_func()
RefreshLinksJob.php line 259 calls MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
RefreshLinksJob.php line 179 calls RefreshLinksJob->getParserOutput()
RefreshLinksJob.php line 126 calls RefreshLinksJob->runForTitle()
JobExecutor.php line 79 calls RefreshLinksJob->run()
RunSingleJob.php line 76 calls MediaWiki\Extension\EventBus\JobExecutor->execute()

The solution is probably to introduce a fallback to master in constructSlotRecords(), in the same way we have done it in loadSlotRecords().

daniel added a comment.EditedSep 11 2020, 8:12 PM

Observation:
Some of the revisions for which this problem is being reported are missing from the labs replica. E.g.:
reqId: 61a4914d-39ef-4fcf-a169-19a4435c2a09
revid: 977912002
wiki: enwiki

wikiadmin@10.192.48.87(enwiki)> select * from revision where rev_id = 977912002;
Empty set (0.00 sec)

MariaDB [enwiki_p]> select * from slots join content on slot_content_id = content_id where slot_revision_id = 977912002 \G
Empty set (0.01 sec)

The revision is present master, and is not deleted:

> select * from revision where rev_id = 977912002 \G
*************************** 1. row ***************************
        rev_id: 977912002
      rev_page: 48255424
rev_comment_id: 0
     rev_actor: 0
 rev_timestamp: 20200911184757
rev_minor_edit: 0
   rev_deleted: 0
       rev_len: 1545
 rev_parent_id: 977911774
      rev_sha1: qqugtkzso9kanir9k8py66op3x4h1tv
1 row in set (0.00 sec)

wikiadmin@10.192.48.87(enwiki)> select * from slots join content on slot_content_id = content_id where slot_revision_id = 977912002 \G
*************************** 1. row ***************************
slot_revision_id: 977912002
    slot_role_id: 1
 slot_content_id: 968890405
     slot_origin: 977912002
      content_id: 968890405
    content_size: 1545
    content_sha1: qqugtkzso9kanir9k8py66op3x4h1tv
   content_model: 1
 content_address: tt:990025728
1 row in set (0.00 sec)

Why would this revision be filtered out?

Or is replication somehow flaky? Are some rows (revision, slots, or content) missing from some replicas? That would epxlain the errors.

JJMC89 added a subscriber: JJMC89.Sep 11 2020, 8:23 PM

Why would this revision be filtered out?

Or is replication somehow flaky?

T262239: enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged

AMooney assigned this task to daniel.Sep 15 2020, 8:58 PM
Reedy added a subscriber: Reedy.Sat, Oct 3, 8:01 PM

As we're referencing this task in error messages... The description should really continue something more... substantial on how people should be "fixing" it, rather than having to dig through the comments

Just like to note that I encountered this error while upgrading from 1.29.1 to 1.35.0, and I was able to work around it by running the maintenance script populateContentTables.php, which failed with a duplicate entry for key "PRIMARY" error but still allowed the update.php script to continue. It seems like the updater tried to work on the "slots" table before it was even populated, hence the slot not found error.

daniel added a comment.Mon, Oct 5, 8:52 AM

Just like to note that I encountered this error while upgrading from 1.29.1 to 1.35.0, and I was able to work around it by running the maintenance script populateContentTables.php, which failed with a duplicate entry for key "PRIMARY" error but still allowed the update.php script to continue. It seems like the updater tried to work on the "slots" table before it was even populated, hence the slot not found error.

That problem seems to be specific for updating from 1.29 to 1.35, and would have a different root casue. While the symptom is the same, it would probably be useful to have a separate task for it. Can you file it separately? Thank you!

daniel added a comment.Mon, Oct 5, 8:54 AM

As we're referencing this task in error messages... The description should really continue something more... substantial on how people should be "fixing" it, rather than having to dig through the comments

Do you know how to fix it? I don't even know what causes this. I can't reproduce it, and looking at the code, I can't think of any condition that would trigger this.

Aklapper removed a subscriber: Anomie.Fri, Oct 16, 5:01 PM