Add support for foreign entities to EntityId
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	daniel
	Apr 22 2016, 12:10 PM

Description

Based on the discussion around T56085 and previously on T101752 and T73996, I propose to add support for "foreign" EntityIds[*] as follows described below:

[*let's reserve "external" for the external-id property data type]

EntityId gets an additional field repository, which contains the logical name of the repository that the ID refers to. This serves as a namespace mechanism for entity IDs. The name used to refer to a given repo is local, and can differ from client to client.
repository names can be used to look up the repository Site object, which in turn allows the ID to be mapped to the currect URI, URL, or api path for the repo it belongs to.
The empty repository name "" refers to the local wiki.
The serialization of the EntityId follows the pattern <repository>:<id>. If <repository> is empty, the leading ":" is optional. This format follows the convention set by XML namespaces, RDF prefixes, and MediaWiki interwiki links.
Repository names can be mapped during serialization and deserialization.
When reading data from a another repo, repo names in entity ids get mapped from the names used on the other repo to the names used on the local wiki. In particular, the empty prefix is mapped to the foreign repo's name while reading ids.
When writing data to a another repo, repo names in entity ids get mapped from the names used on the local wiki to the names used on the foreign repo. In particular, the foreign wiki's name is mapped to the empty prefix.
The repo name in an EntityId object can always be assumed to match the local definition of that name. A repo name in serialized data however cannot be interpreted without knowing which repo it came from.

When receiving prefixed entities from another repo, prefixes are "chained", like interwiki prefixes:

When reading d:Q5 from repo foo, this is turned into foo:d:Q5, meaning "Q5 at the repo that repo foo calls d".
The local repo may have mappings defined for the prefixes used by other wikis, e.g. "foo:d" would be known to be the same as the local prefix "wd" for Wikidata.
During deserialization data from another repo, we always add the name of the source repo as a prefix, and then resolve any known mappings: d:Q5-from-foo becomes foo:d:Q5 and then wd:Q5.
If no mapping is known, we store the "chained" version of the ID (foo:d:Q5) locally. There's really nothing else we could do. If we send this kind of ID to yet another repo, that may result in longer "chains" of prefixes, like xyz:foo:d:Q5.
During deserialization data from the local database, we don't add any prefix, but we still esolve any known mappings. If foo:d:Q5 was stored earlier because there was no mapping defined for foo:d then, but there is a mapping now, that mapping is resolved when loading the item.
Note that foo:bar:Q5 and bar:foo:Q5 may mean different thigns (or the same thing), depending on the mappings defined in the foo and the bar repo.

This ensures that an EntityId object always "knows" which repo it belongs to, and it always reflects the currently defiend mappings. This implies however that an ID used in an old revision can change its effective serialization later - it will look different when the mappings change. The ID would however still mean the same, sicne it still references the same entity (provided the mappings were defined correctly).

Rationale:

Fully backwards compatible
Compact yet readable
Transparent for most repo and client code

Related Objects
Search...

Status	Subtype	Assigned	Task
Declined		dchen	T118706 Conduct heuristic evaluation of image upload and insert flow in VisualEditor
Open		None	T115858 Design improvements for mw.ForeignStructuredUpload.BookletLayout
Open		None	T115865 Insert image in content immediately after it's uploaded, skipping the "General settings" step
Duplicate		None	T115864 Figure out if the description of the image can be used as the caption on-wiki
Open	Feature	None	T53032 When inserting an image, set its caption by default to be the Commons image description
Open	Feature	None	T39534 Wikimedia Commons should support searching by color
Duplicate		None	T39535 Wikimedia Commons should support filtering by color
Resolved		None	T19503 Provide metadata support on Wikimedia Commons
Resolved		None	T51662 VisualEditor: Use Multimedia/Wikidata's proposed rich structured meta-data in the image insertion dialog
Resolved		None	T68108 [Epic] Store media information for files on Wikimedia Commons as structured data
Open		None	T109579 [Epic] Give more sister projects access to Wikidata
Open		None	T187900 There is no way to reference a specific quote on Wikiquote
Stalled		None	T71753 [Story] Wikibase / Wikidata support on Wikiquote
Resolved		Lydia_Pintscher	T146637 Wikidata 2016 Q4 goals
Resolved		Lydia_Pintscher	T150179 Wikidata 2017 Q1 goals
Resolved		Lydia_Pintscher	T76007 [Epic] add ability to link/refer to foreign items and properties (federation)
Resolved		Lydia_Pintscher	T149580 Allow access to entity data stored on "foreign" repositories (federation)
Resolved		daniel	T133381 Add support for foreign entities to EntityId
Resolved		Jakob_WMDE	T145516 Add repository field to EntityId
Resolved		WMDE-leszek	T146030 Add support for repo prefixes to DispatchingEntityIdParser
Resolved		WMDE-leszek	T146274 Create an EntityIdParser that maps foreign repo prefixes

Event Timeline

daniel created this task.Apr 22 2016, 12:10 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 22 2016, 12:10 PM

daniel mentioned this in T101752: [RFC] Introduce ExternalEntityId.Apr 22 2016, 12:10 PM

daniel updated the task description. (Show Details)

daniel mentioned this in T73996: [RFC] Find a way to make global item ids.

daniel added a parent task: T76007: [Epic] add ability to link/refer to foreign items and properties (federation).Apr 22 2016, 12:13 PM

daniel added subscribers: JeroenDeDauw, aude.

daniel updated the task description. (Show Details)Apr 22 2016, 12:28 PM

jayvdb subscribed.Apr 22 2016, 12:31 PM

daniel added a project: Wikidata-Sprint-2016-04-26.Apr 22 2016, 12:34 PM

Lydia_Pintscher moved this task from incoming to consider for next sprint on the Wikidata board.Apr 25 2016, 10:00 AM

Ricordisamoa subscribed.Apr 27 2016, 4:37 AM

Lydia_Pintscher triaged this task as Medium priority.May 19 2016, 12:52 PM

• Niharika mentioned this in T121731: Investigation: Assistance with structured data on Commons.Jul 19 2016, 3:07 PM

WMDE-leszek subscribed.Sep 8 2016, 12:31 PM

Jakob_WMDE subscribed.Sep 12 2016, 10:52 AM

When adding support for the repo prefix to EntityId, please note the following points:

the string returned by getSerialization() should include the prefix.
equals() should consider the prefix.

New methods to add:

getRepoName() (or getRepoPrefix, or getOrigin, or...); returns "" for the local repo.
getLocalPart() (this might not even be needed, but nice to have)
isForeign() (to check and fail in some critical code pathes)

Jakob_WMDE added a subtask: T145516: Add repository field to EntityId.Sep 13 2016, 2:25 PM

Esc3300 added a parent task: T91505: [Epic] Adding new datatypes to Wikidata (tracking).Sep 13 2016, 4:47 PM

daniel added a project: WMDE-TLA-Team.Sep 19 2016, 1:43 PM

daniel moved this task from proposed to accepted on the WMDE-TLA-Team board.Sep 19 2016, 1:46 PM

daniel moved this task from accepted to doing on the WMDE-TLA-Team board.

daniel claimed this task.Sep 19 2016, 1:51 PM

daniel created subtask T146030: Add support for repo prefixes to DispatchingEntityIdParser.

daniel moved this task from doing to tracking on the WMDE-TLA-Team board.Sep 19 2016, 1:57 PM

daniel updated the task description. (Show Details)Sep 21 2016, 9:43 AM

daniel updated the task description. (Show Details)

daniel updated the task description. (Show Details)Sep 21 2016, 9:46 AM

daniel added a subtask: T146274: Create an EntityIdParser that maps foreign repo prefixes.Sep 21 2016, 4:42 PM