Page MenuHomePhabricator

Revision 386521405 belongs to M43864745 instead of expected M79277318
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

MediaWiki version: 1.35.0-wmf.16

message
Revision 386521405 belongs to M43864745 instead of expected M79277318

Reproduction: https://commons.wikimedia.org/w/api.php?action=query&prop=description&titles=File:L%27Int%C3%A9rieur%20du%20Mus%C3%A9e%20des%20beaux-arts%20de%20Nancy.jpg

Impact

Notes

COVID-19 Deployment Criteria

  • Can you roll back this change without lasting impact?
    1. A recovery plan is required as this will help identify our capacity for recovering from the failure
    2. THIS IS A KEY QUESTION, if you can’t answer it, you shouldn’t deploy
  • Is specialized knowledge required to support this change in production? If so, are there multiple people with this knowledge?
  • Is there a way to increase confidence about the correctness of this change?
    1. Reviews (Design, Code, etc)
    2. Testing coverage (unit tests, integration tests)
    3. Manual testing (e.g. Beta, vagrant, docker)

Details

Request ID
XiXviQpAAD8AAI1KJCkAAADN
Request URL
https://commons.wikimedia.org/w/api.php
Stack Trace
exception.trace
#0 /srv/mediawiki/php-1.35.0-wmf.16/extensions/WikibaseMediaInfo/src/Services/MediaInfoPrefetchingTermLookup.php(59): Wikibase\Lib\Store\Sql\WikiPageEntityRevisionLookup->getEntityRevision(Wikibase\MediaInfo\DataModel\MediaInfoId)
#1 /srv/mediawiki/php-1.35.0-wmf.16/extensions/WikibaseMediaInfo/src/Services/MediaInfoPrefetchingTermLookup.php(108): Wikibase\MediaInfo\Services\MediaInfoPrefetchingTermLookup->prefetchTerm(Wikibase\MediaInfo\DataModel\MediaInfoId, array, array)
#2 /srv/mediawiki/php-1.35.0-wmf.16/extensions/Wikibase/data-access/src/ByTypeDispatchingPrefetchingTermLookup.php(52): Wikibase\MediaInfo\Services\MediaInfoPrefetchingTermLookup->prefetchTerms(array, array, array)
#3 /srv/mediawiki/php-1.35.0-wmf.16/extensions/Wikibase/data-access/src/ByTypeDispatchingPrefetchingTermLookup.php(52): Wikibase\DataAccess\ByTypeDispatchingPrefetchingTermLookup->prefetchTerms(array, array, array)
#4 /srv/mediawiki/php-1.35.0-wmf.16/extensions/Wikibase/client/includes/Store/DescriptionLookup.php(143): Wikibase\DataAccess\ByTypeDispatchingPrefetchingTermLookup->prefetchTerms(array, array, array)
#5 /srv/mediawiki/php-1.35.0-wmf.16/extensions/Wikibase/client/includes/Store/DescriptionLookup.php(73): Wikibase\Client\Store\DescriptionLookup->getCentralDescriptions(array)
#6 /srv/mediawiki/php-1.35.0-wmf.16/extensions/Wikibase/client/includes/Api/Description.php(72): Wikibase\Client\Store\DescriptionLookup->getDescriptions(array, array, array)
#7 /srv/mediawiki/php-1.35.0-wmf.16/includes/api/ApiQuery.php(255): Wikibase\Client\Api\Description->execute()
#8 /srv/mediawiki/php-1.35.0-wmf.16/includes/api/ApiMain.php(1603): ApiQuery->execute()
#9 /srv/mediawiki/php-1.35.0-wmf.16/includes/api/ApiMain.php(539): ApiMain->executeAction()
#10 /srv/mediawiki/php-1.35.0-wmf.16/includes/api/ApiMain.php(510): ApiMain->executeActionWithErrorHandling()
#11 /srv/mediawiki/php-1.35.0-wmf.16/api.php(78): ApiMain->execute()
#12 /srv/mediawiki/w/api.php(3): require(string)
#13 {main}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
matthiasmullie subscribed.

This is similar to T231276.

This basically happens when a page's history gets split (https://commons.wikimedia.org/wiki/Commons:History_merging_and_splitting#History_splitting)
In this case: https://commons.wikimedia.org/?curid=79277318 & https://commons.wikimedia.org/?curid=43864745

The original page (and its slots) gets deleted.
It gets restored (along with the relevant slots) & moved somewhere in the middle of the history.
The rest of history (along with the relevant slots) gets restored.
And we now have 2 pages with MediaInfo slots pointing to 1 original source...

Change 573315 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] [WIP] Update entity data on undelete to match new page id

https://gerrit.wikimedia.org/r/573315

Change 573316 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] [WIP] Don't allow restoring conflicting structured data

https://gerrit.wikimedia.org/r/573316

Change 573316 abandoned by Matthias Mullie:
[WIP] Don't allow restoring conflicting structured data

Reason:
let's not do this

https://gerrit.wikimedia.org/r/573316

So @Addshore @thiemowmde @matthiasmullie what's the story with this https://github.com/wmde/WikibaseDataModel/pull/823 ? We don't seem to have a consensus on how to move forward

Linking T204541 also as reference to this task as it may be relevant.

I recently wrote some docs in Wikibase.git covering this element of mediawiki which Wikibase just avoids.
This patch can be found at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/576393/

I also just wrote a rather lengthy response to some points in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikibaseMediaInfo/+/573315/ that I want to quote here for searchability etc!

I would like to argue that a statement GUID should never change, not even when this means the entity ID part in that ID doesn't match any more. It's called a globally unique identifier for a reason. If we start changing it, it's not a unique identifier any more.

It's a fair argument, but the "norm" is already that these "statement guids" do change when we talk about merges.

For example these edits:
https://test.wikidata.org/w/index.php?title=Q212044&diff=prev&oldid=527336
https://test.wikidata.org/w/index.php?title=Q212045&diff=527337&oldid=527334

Result in this guid change for the statement:
Q212044$c75a619d-4e17-a091-acbf-9496c67c7031
Q212045$ECEBF45B-DD57-4094-BC80-D4CDE946A5F0

I'm aware of a single place in the Wikibase codebase that tries to extract the entity ID from a statement GUID: the "wbsetclaim" as well as "wbsetclaimvalue" APIs allow to pass nothing but a GUID. In theory, it's possible to scan the entire database for this GUID. For performance reasons, this is not what happens, but the entity is found by extracting it's ID from the statement GUID.

The Entity type (that comes from the ID) seems to be used once in DispatchingEntityTypeStatementGrouper.
GetClaims, RemoveClaims, RemoveQualifiers, RemoveReferences, SetClaim, SetClaimValue, SetQualifier, SetReference uses the entity ID from the GUID if no entity ID is specified. Without this either the client must know which entity the GUID belongs to, or we must have an index of the current 1.1 billion statements (and future created ones too).
WikibaseQualityConstraints then also uses the entityid from statement guid (briefly discussed on https://phabricator.wikimedia.org/T204541#4601136)

But this is really just a performance optimization. Technically, there is no guarantee a statement GUID will always be constructed this way.

Right now it is an assumption enforced in code. I would call that somewhat of a guarantee right now.
The StatementGuid class also refers to:

A statement id consists of the entity id serialization

  • of the entity it belongs to (e.g. "Q1") and a randomly generated global unique identifier (GUID),
  • separated by a dollar sign.

If you are not using these APIs, there should be no need for you to touch this. The same can be achieved by using the wbeditentity API.

I believe some or most of these APIs are being used by mediainfo. And anyway, as said, the guid is specified as including the entity id that the statement belongs to, so, unless we change that definition, we should go removing the entityid from anywhere.

T204541 raised the idea of removing the entity ID before. If something like that is in peoples heads then perhaps we should focus the discussion there.
Otherwise it might make sense to keep the more fundamental discussion on T244207 for searchability in the future.

Current Wikibase "native" merging code can be found at https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/2d76a79e2be4833e8c32564bbf925fe570f41228/repo/includes/Merge/StatementsMerger.php#L44-L60
Here you can see that that statement guids are totally recreated at this point, (not needing to access the guid part of the old statement guid as it is not used)
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikibaseMediaInfo/+/573315/2/src/WikibaseMediaInfoHooks.php line 734 takes a different approach trying to reuse this guid part of the statement guid which is also why https://github.com/wmde/WikibaseDataModel/pull/823 has been created.

One way of looking at this is that no merge is happening here, in that case why should the statement guids or the smaller guids that make up part of the statement guid change?

The second way of looking at it is, these statements are indeed being moved from one entity to another, in which case the guids should probably change in the same way as when statements move in other places.

I lean toward this second option personally with everything that has so far surfaced.

A few questions to the mediainfo team that may not yet be explicitly stated anywhere:

  • How often is this history merge thing happening? And why? (Understanding the reason behind such merges might help out here a bit)
  • Is the code in the patch (keeping the guid part the same) intentional (I guess it is). And Why is that deemed necessary for the mediainfo usecase?

This circles back to the fact that probably the media info IDs should be more stable than they currently are.
Perhaps we should re asses the decision to keep these bound to the pageid of the file?

Change 585754 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/vendor@master] "wikibase/data-model": "9.4.0"

https://gerrit.wikimedia.org/r/585754

Change 585754 merged by jenkins-bot:
[mediawiki/vendor@master] "wikibase/data-model": "9.4.0"

https://gerrit.wikimedia.org/r/585754

Change 573315 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Update entity data on undelete to match new page id

https://gerrit.wikimedia.org/r/573315