Page MenuHomePhabricator

Statements not editable on a File page whose id has changed
Open, Needs TriagePublic

Description

Sometimes, when a page is deleted and restored, its page_id may change https://www.mediawiki.org/wiki/Manual:Page_table#page_id

MediaInfo data for a slot on a File page is stored in a json blob, and that json blob contains an id for the MediaInfo item that is based on the page_id of the File page. So, for example, if a page with page_id=1234 has MediaInfo data in a slot, the MediaInfo item will contain the field "id": "M1234"

If a File page is deleted and restored, and its page_id changes, then the json blob is not updated, and so the MediaInfo item's id no longer corresponds to the page_id. The MediaInfo blob can still be retrieved from the db, because the query to retrieve it is based on the page_id of the File page and the table keys are updated during the page restore, but the "id" value in the json is wrong.

Until recently, this discrepancy caused a fatal error (see T231276). This patch replaces the fatal error with a warning

The problem remains, however, that statements on the File page cannot be edited, and an attempt to edit via the File page UI (which uses the api call wbsetclaim) fails

Example request:

action	wbsetclaim
format	json
claim	{
	"type": "statement",
	"mainsnak": {
		"snaktype": "value",
		"property": "P737",
		"datavalue": {
			"type": "wikibase-entityid",
			"value": {
				"id": "Q72"
			}
		}
	},
	"id": "M5677$e3e5035d-4be7-fbf6-c032-e7363c017e58",
	"rank": "preferred"
}
bot	1
assertuser	Admin
token	0a08e306d7c92d93fd545e0317f2eb375d70f00c+\

Example response:

"error": {
			"code": "editconflict",
			"info": "Edit conflict. Could not patch the current revision.",
			"messages": [{
				"name": "wikibase-api-editconflict",
				"parameters": [],
				"html": {
					"*": "Edit conflict. Could not patch the current revision."
				}
			}, {
				"name": "edit-conflict",
				"parameters": [],
				"html": {
					"*": "Edit conflict."
				}
			}],
			"*": "See http://127.0.0.1:8080/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."
		}

... and if baserevid is removed from the request params sent to wbsetclaim then there is no error, but the claim is added to the page with page_id equal to the numeric part of the MediaInfo id instead of modifying the original MediaInfo item

Event Timeline

The reason for this is happening is the following code in MediaInfoHandler.php

public function getTitleForId( EntityId $id ) {
		return Title::newFromID( $id->getNumericId() );
	}

... so if you pass it M1234 you'll get the page with page_id 1234, regardless of what page this MediaInfo item is in a slot in

And that's why you get the editconflict, because the baserevid is a revision of the page that contains the slot, but the numeric part of the M-id points either to a non-existent page with no revision, or to a different page that doesn't have a revision that matches baserevid

As an example, the page https://commons.wikimedia.org/wiki/File:John_Carmack_-_The_Dawn_of_Mobile_VR_-_Game_Developer_Conference_2015.jpg contains the MediaInfo item

{
	"type": "mediainfo",
	"id": "M78910210",
	"labels": [],
	"descriptions": [],
	"statements": {
		"P180": [{
			"mainsnak": {
				"snaktype": "value",
				"property": "P180",
				"hash": "ce0cdc5a9dd96dd5deee2c39bbb73cd054c59cc1",
				"datavalue": {
					"value": {
						"entity-type": "item",
						"numeric-id": 92605,
						"id": "Q92605"
					},
					"type": "wikibase-entityid"
				}
			},
			"type": "statement",
			"id": "M43674259$9e3ad3bc-4aae-53a9-b26f-28e8cc2187d6",
			"rank": "normal"
		}]
	}
}

The id of the File page is 78910210, the id of the original page the statement was entered on is 43674259. The wbsetclaim code takes the GUID of the claim, gets page 43674259 and then tries to match it again revision id 347013814 (the latest revision of the page), fails and throw an exception

And if you don't send baserevid, the wbsetclaim code takes the GUID of the claim, again gets page 43674259 and sets the claim on that page which is https://commons.wikimedia.org/wiki/File:John_Carmack_-_The_Dawn_of_Mobile_VR_-_Game_Developer_Conference_2015_-_cropped.jpg instead

Cparle updated the task description. (Show Details)

We had some discussions within the Wikidata team today and came to the conclusion that something like https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/573315 is probably the best thing.
This needs to be done on:

  • undelete (for when page_id might change, or for the case of page history merges?)
  • undo & restore (for the case history has been merged and someone desires to restore a previous media info revision that still has the old media info id in there)

We said that:

  • Magically rewriting the content of the old revisions (with the wrong ids) either in storage or during presentation would probably be bad, evil and missleading.
  • Ideally viewing history would just correctly render the old revisions etc.

We discussed the idea of removing the entity id from the json and also from the start of statement guids when stored and instead only showing these in the presentation layer.
This would probably be more work and risk than it is worth and after lots of discussion came to the conclusion that the extra revision and hook stuff would be better.

I would hard veto rewriting old revisions. That should never be done, unless we run a maintenance script to fix some bug and we announce it widely, and provide time for any downstream users (including dumps!) that rely on old revisions being unchanged, to update their data or scripts.

Creating a new revision with new slot record with the new page id seems right to me, at least in the short-to-midterm. I am very interested in how old revisions with the old page id get connected up to the new revision and page id, however.