[RFC] Introduce ExternalEntityId
Closed, DeclinedPublic

Description

3rd party clients need to be able to refer to Wikidata items as well as their local items, which also have the form Q1234. In order to solve this issue, we need a way to represent "external" entities in our data model:

  • implement EntityId as a wrapper around a URI (plus the entity type id).
  • the serialization would be the full URI
  • an external item would have the same type string, but not the same id class, as an internal item.
daniel created this task.Jun 8 2015, 6:59 PM
daniel updated the task description. (Show Details)
daniel raised the priority of this task from to Needs Triage.
daniel added a subscriber: daniel.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 8 2015, 6:59 PM
daniel set Security to None.

This is something that's come up a few times over the last years as something that "we will probably need at some point". What are the concrete usecases we have for this now? Without those it is rather hard to say anything about what kind of approach is best taken.

Are you suggesting to add a URI to EntityId or to create a derivative that has one?

My main concern with adding such functionality is that it very likely will be a breaking change we should be quite careful with. It'll be easy to make in Wikibase DataModel, though needs caution wherever EntityId is used due to its contract change.

daniel added a comment.Jun 9 2015, 4:25 PM

What are the concrete usecases we have for this now?

3rd party wikis using Wikidata items as unit identifiers.

Are you suggesting to add a URI to EntityId or to create a derivative that has one?

A derivative, e.g. ExternalEntityId or EntityUri.

My main concern with adding such functionality is that it very likely will be a breaking change we should be quite careful with. It'll be easy to make in Wikibase DataModel, though needs caution wherever EntityId is used due to its contract change.

I see no breaking change nor a contract change. That could happen if we allow loading full entities from an external source. But for now, this could simply by new type of EntityId.

JeroenDeDauw added a comment.EditedJun 9 2015, 9:20 PM

3rd party wikis using Wikidata items as unit identifiers.

Can you be more specific? I'm guessing you are talking about third party wikis running a Wikibase Repository, and have items in there with entityid datavalues pointing to Wikidata?

I see no breaking change nor a contract change.

Will all code that currently works with an EntityId do the correct thing when it gets such an instance of this new derivative? I don't think so.

Lydia_Pintscher renamed this task from Introduce ExternaEntityId to Introduce ExternalEntityId.Jun 10 2015, 6:35 AM

I think this is a useful change if you want Wikibase sites to be able to refer to other Wikibase sites. In WDTK, all of our EntityId objects are "external", of course. A lesson learned for us was that it is not enough to know the base URI in all cases. You sometimes need URLs for API, file path, and page path in addition to the plain URI. MediaWiki already has a solution for this in form of the sites table. I would suggest to use this and to store pairs <sitekey,localEntityId> and to have the URI prefix stored in the sites table. It's cleaner than storing the actual URI string (which might change if an external site is reconfigured!) in the actual values on the page.

A strategy for introducing this without breaking anything much is to keep the local wikis sitekey as the default setting in all cases. So callers who are not aware of the external site support can keep sending "local" ids and will get the right thing. Only when they read data they will have to mind the new information (but that's always the case if you enable linking to external entities).

But, overall, I think it would be good to make this change. Commons will want to link to Wikidata, for example, but also many Wikibase instances outside of WMF will benefit from the ability to link to WIkidata content.

3rd party wikis using Wikidata items as unit identifiers.

Can you be more specific? I'm guessing you are talking about third party wikis running a Wikibase Repository, and have items in there with entityid datavalues pointing to Wikidata?

Yes, in EntityIdValues, but also as unit URI in QuantityValue, calendar URI in TimeValue, and globe URI in GlobeCoordinateValue.

I see no breaking change nor a contract change.

Will all code that currently works with an EntityId do the correct thing when it gets such an instance of this new derivative? I don't think so.

Code that uses getNumericId will break. The assumption that getNumericId exists is invalid,, but still made in some parts of the code.

Other than that, I can't think of any issues. EntityId is completely opaque.

I would suggest to use this and to store pairs <sitekey,localEntityId> and to have the URI prefix stored in the sites table. It's cleaner than storing the actual URI string (which might change if an external site is reconfigured!) in the actual values on the page.

Cool URIs never change :)

Yes, I have been thinking about this too. May be an option, not quite sure yet.

A strategy for introducing this without breaking anything much is to keep the local wikis sitekey as the default setting in all cases. So callers who are not aware of the external site support can keep sending "local" ids and will get the right thing. Only when they read data they will have to mind the new information (but that's always the case if you enable linking to external entities).

That was the idea, yes.

Other than that, I can't think of any issues. EntityId is completely opaque.

While EntityId might not be explicitly providing certain guarantees, many users of it can still make assumptions that would be violated when introducing this new derivative. If the URL is included, then the max length is presumably different. I just did a quick search on the Wikibase.git codebase and there are over a thousand places using EntityId. The first type hint I looked at was RepoLinker::getEntityUrl. By looking at this quickly, it is very much not obvious to me this code will behave correctly when getting this new EntityId derivative. In fact, it looks like it will not. I'd hate for us having to go through all these EntityId usages to verify they will handle the new one correctly.

Lydia_Pintscher triaged this task as Normal priority.Jun 12 2015, 1:16 PM
jayvdb added a subscriber: jayvdb.Jun 17 2015, 4:02 AM

It would be lovely to have some supported way to distinguish between items on 'wikidata' and the local wikibase. To work around this, I set up a private wikibase which is a subset of wikidata, syncronised periodically with a manual verification process, and changed the Q number generation for local (private) items to start at an extremely high number. Not the cleanest solution, but it worked. My first idea was to use a different letter prefix or namespace for the local items, but those approaches were not feasible in the time I had to get it working.

On the issue of foreign sites, please work the core team to create a single API for sites. See T85153 and and T89540.

thiemowmde renamed this task from Introduce ExternalEntityId to [RFC] Introduce ExternalEntityId.Sep 10 2015, 4:23 PM
thiemowmde added a subscriber: thiemowmde.
Bene added a subscriber: Bene.Mar 8 2016, 10:26 AM