Page MenuHomePhabricator

Add support for foreign entities to EntityId
Closed, ResolvedPublic

Description

Based on the discussion around T56085 and previously on T101752 and T73996, I propose to add support for "foreign" EntityIds[*] as follows described below:

[*let's reserve "external" for the external-id property data type]

  • EntityId gets an additional field repository, which contains the logical name of the repository that the ID refers to. This serves as a namespace mechanism for entity IDs. The name used to refer to a given repo is local, and can differ from client to client.
  • repository names can be used to look up the repository Site object, which in turn allows the ID to be mapped to the currect URI, URL, or api path for the repo it belongs to.
  • The empty repository name "" refers to the local wiki.
  • The serialization of the EntityId follows the pattern <repository>:<id>. If <repository> is empty, the leading ":" is optional. This format follows the convention set by XML namespaces, RDF prefixes, and MediaWiki interwiki links.
  • Repository names can be mapped during serialization and deserialization.
  • When reading data from a another repo, repo names in entity ids get mapped from the names used on the other repo to the names used on the local wiki. In particular, the empty prefix is mapped to the foreign repo's name while reading ids.
  • When writing data to a another repo, repo names in entity ids get mapped from the names used on the local wiki to the names used on the foreign repo. In particular, the foreign wiki's name is mapped to the empty prefix.
  • The repo name in an EntityId object can always be assumed to match the local definition of that name. A repo name in serialized data however cannot be interpreted without knowing which repo it came from.

When receiving prefixed entities from another repo, prefixes are "chained", like interwiki prefixes:

  • When reading d:Q5 from repo foo, this is turned into foo:d:Q5, meaning "Q5 at the repo that repo foo calls d".
  • The local repo may have mappings defined for the prefixes used by other wikis, e.g. "foo:d" would be known to be the same as the local prefix "wd" for Wikidata.
  • During deserialization data from another repo, we always add the name of the source repo as a prefix, and then resolve any known mappings: d:Q5-from-foo becomes foo:d:Q5 and then wd:Q5.
  • If no mapping is known, we store the "chained" version of the ID (foo:d:Q5) locally. There's really nothing else we could do. If we send this kind of ID to yet another repo, that may result in longer "chains" of prefixes, like xyz:foo:d:Q5.
  • During deserialization data from the local database, we don't add any prefix, but we still esolve any known mappings. If foo:d:Q5 was stored earlier because there was no mapping defined for foo:d then, but there is a mapping now, that mapping is resolved when loading the item.
  • Note that foo:bar:Q5 and bar:foo:Q5 may mean different thigns (or the same thing), depending on the mappings defined in the foo and the bar repo.

This ensures that an EntityId object always "knows" which repo it belongs to, and it always reflects the currently defiend mappings. This implies however that an ID used in an old revision can change its effective serialization later - it will look different when the mappings change. The ID would however still mean the same, sicne it still references the same entity (provided the mappings were defined correctly).

Rationale:

  • Fully backwards compatible
  • Compact yet readable
  • Transparent for most repo and client code

Related Objects

StatusSubtypeAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenFeatureNone
OpenFeatureNone
DuplicateNone
ResolvedNone
ResolvedNone
ResolvedNone
OpenNone
OpenNone
StalledNone
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
ResolvedLydia_Pintscher
Resolveddaniel
ResolvedJakob_WMDE
ResolvedWMDE-leszek
ResolvedWMDE-leszek

Event Timeline

When adding support for the repo prefix to EntityId, please note the following points:

  • the string returned by getSerialization() should include the prefix.
  • equals() should consider the prefix.

New methods to add:

  • getRepoName() (or getRepoPrefix, or getOrigin, or...); returns "" for the local repo.
  • getLocalPart() (this might not even be needed, but nice to have)
  • isForeign() (to check and fail in some critical code pathes)
daniel moved this task from accepted to doing on the WMDE-TLA-Team board.
daniel updated the task description. (Show Details)

Status: the data model now supports this, but the necessary patches for the ID parser are not yet merged, see: