Page MenuHomePhabricator

[ES-M2]: Define canonical URI for EntitySchemas
Closed, ResolvedPublic3 Estimated Story Points

Description

As a Wikidata reuser, I want to have a canonical URI scheme for EntitySchemas in order to identify them as concepts and access them in various formats e.g. RDF.

Problem:
EntitySchemas at Wikidata reside at the URL https://www.wikidata.org/wiki/EntitySchema:EXX, but this is not a canonical URI.

In order to identify EntitySchemas as concepts in RDF etc. we should have a canonical URI (prefix) just as we have for Items, Properties, and Lexemes.

/entity/ is currently being used for Items, Properties, and Lexemes. To make a predictable URI scheme this is the preferred prefix for EntitySchemas.

Once the URI has been developed, it should redirect to the actual content, as it does for Items, Properties, and Lexemes.

BDD
GIVEN an EntitySchema
AND a RDF export for an Item or Lexeme (once EntitySchemas are able to be referenced in statements)
WHEN an EntitySchema is referenced in a statement
THEN the reference is a concept URI

Acceptance criteria:

  • EntitySchemas have a defined concept URI
  • The EntitySchema concept URI redirects to the EntitySchema URL (if HTML representation is requested)

Open questions:

Should the prefix start with http or https?
Decided to move forward with http in T333657

Notes
https://www.wikidata.org/wiki/Wikidata:Data_access#Linked_Data_Interface_(URI)

There is a way to represent EntitySchemas as RDF, but this will be approached in the future.

Original ticket

ShEx schemas at Wikidata reside at https://www.wikidata.org/wiki/EntitySchema:Exxx, but this is not a canonical URI. We should have a canonical URI (prefix), and I guess http://www.wikidata.org/entity/ would do, just as for Items, Properties, and Lexemes.

Once a URI prefix has been found:

  • it should be documented whereever necessary
  • http://www.wikidata.org/entity/Exxx (or whatever prefix is chosen) should redirect to actual content, as for Items, Properties, and Lexemes.

A canonical URI would likely be required for T225701, and the shex-simple tool could probably make use of it as well for relative URI resolution to a predictable absolute URI.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

EntitySchema is not relies on Wikibase so it should not use Special:EntityData in Wikibase. Probably we should use www.wikidata.org/entityschema/Exxx

Arian_Bozorg renamed this task from Define canonical URI for EntitySchemas to ES - M2: Define canonical URI for EntitySchemas.Mar 2 2023, 11:04 AM
Arian_Bozorg renamed this task from ES - M2: Define canonical URI for EntitySchemas to [ES-M2]: Define canonical URI for EntitySchemas.Mar 9 2023, 2:47 PM

Task Triage Notes:

  • /entity is to be understood as a semantic entity, within the context of linked data (rather than wikibase per se)
  • As a first priority, we would like to focus our efforts on providing an RDF representation out of the canonical URL, this would probably require us to define a simple set of triples that describe EntitySchema as an RDF document, for example (in ttl):
data:E10 a schema:Dataset ;
	schema:about wd:E10 ;
	cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
	schema:softwareVersion "1.0.0" ;
	schema:version "1850119224"^^xsd:integer ;
	schema:dateModified "2023-03-12T02:07:31Z"^^xsd:dateTime ;

Sprint 6 Planning Notes

  • we will need to decide to align with the current status and go with http or if we do https.
  • this decisions should be documented, likely an ADR would suffice
  • This should involed @ItamarWMDE

Task Break Notes

  • the Special:EntityData page which allows linking to Entity Data in various formats (see https://www.wikidata.org/wiki/Special:EntityData/Q42). Extension entities like Lexeme are also included in it. Looking to see how this is done in other extensions might tip us off in the right direction.
  • the question of http vs https identifiers was mentioned as a potential security risk in this task and was broadly discussed in the task linked in it. At this point we do not see an effect of this decision in the complexity of the work that needs to happen in this task

further discussion was postponed to a later time

@Arian_Bozorg We still need some clarification here. There are three components to this user story that might correspond to three different use cases:

  1. For the RDF representation use case, we can configure the /entity endpoint as this is what is determined for the wd prefix used for linked data entities in Wikidata: @prefix wd: <http://www.wikidata.org/entity/> . However, do we still need to use /wiki/Special:EntityData for the data prefix: @prefix data: <https://www.wikidata.org/wiki/Special:EntityData/> .?
  2. As for JSON representation of Entity schemas, we still need more definition, and it seems unrelated to ensuring that entity schemas are addressable from a wikibase entity (items, properties) - the JSON representation of entity schema in items will be ensured by registering a data type in task T332139: [ES-M2]: Create a new EntitySchema data type
  3. Regarding the concept URI sidebar link, is this required for making Entity Schema addressable? It appears to be a separate feature request. We did include this in a separate task to be triaged: T333655: [ES-M2]: Add link to the Concept URI to the EntitySchema sidebar

In light of these three separate use cases, is it really necessary to use Special:EntityData for content negotiation (that is, have this link to capture all three use cases)? Especially since in prior discussions, we prioritized use case (1) as the main goal for this task.

ItamarWMDE changed the task status from Open to Stalled.Apr 3 2023, 9:53 AM

As I said at T225778#5391086, Special:EntityData is provided by Wikibase, which EntitySchema does not depends on. It may be not ideal that a canonical URI relies on another extension which is not a dependency.

  1. I'll need a day just to decide on this one
  2. Sorry, JSON shouldn't have been included in this ticket and has been removed
  3. Also happy to remove the concept URI sidebar link AC and triage T333655
  1. For the RDF representation use case, we can configure the /entity endpoint as this is what is determined for the wd prefix used for linked data entities in Wikidata: @prefix wd: <http://www.wikidata.org/entity/> . However, do we still need to use /wiki/Special:EntityData for the data prefix: @prefix data: <https://www.wikidata.org/wiki/Special:EntityData/> .?

After some discussion with Lydia, we agreed that we do not need https://www.wikidata.org/wiki/Special:EntityData at this point as we are only working on HTML representation of EntitySchemas. I will update the task description accordingly

In my opinion the term "entity" should be reserved by those served by Wikibase or Wikibase-based extension. For example, while WikiLambda defines pages with Wikibase-like labels and descriptions, they are not called entities.

Creating another prefix will also make server redirect configuation easier.

After some discussion with Lydia, we agreed that we do not need https://www.wikidata.org/wiki/Special:EntityData at this point as we are only working on HTML representation of EntitySchemas. I will update the task description accordingly

So does that mean that /entity/E1 redirects directly to /wiki/EntitySchema:E1? And then we later… change it when we add RDF support?

And what happens if someone requests /entity/E1 with e.g. Accept: application/json or Accept: text/turtle? Do they get the HTML page back anyways?

Yes, so for this ticket /entity/e10 will redirect to the HTML representation /wiki/EntitySchema:E10.

And what happens if someone requests /entity/E1 with e.g. Accept: application/json or Accept: text/turtle? Do they get the HTML page back anyways?

When users make those requests, at this point, it will return a "bad request".

And when we begin to support other file types (RDF, JSON etc.) we will redirect to /wiki/Special:EntityData to maintain consistency with the other Semantic Entities.

In my opinion the term "entity" should be reserved by those served by Wikibase or Wikibase-based extension. For example, while WikiLambda defines pages with Wikibase-like labels and descriptions, they are not called entities.

We have defined ubiquitous language so that there is consistency in the way we are discussing "entities".

In this case, we are aligning EntitySchemas with other Semantic Entities (Items, Lexemes, Properties).

And we are working on building consistency across these Semantic Entities to make it as easy as possible for users and resuers.

You can see the mapping of these Wikibase Domain concepts here:
https://www.mediawiki.org/wiki/Wikibase/DataModel/DomainConceptsMapping

ItamarWMDE changed the task status from Stalled to Open.Apr 26 2023, 2:16 PM

After a discussion with @dcausse we agreed to try and provide a sample .ttl of an item which includes a triple that has an EntitySchema value, to test this with the query service

Change 912326 had a related patch set uploaded (by Michael Große; author: Michael Große):

[operations/deployment-charts@master] Handle Canonical URL for EntitySchemas

https://gerrit.wikimedia.org/r/912326

Change 912327 had a related patch set uploaded (by Michael Große; author: Michael Große):

[operations/puppet@production] Handle Canonical URL for EntitySchemas

https://gerrit.wikimedia.org/r/912327

Sprint 8 Review - code was copy-pasted in two locations. We should (eventually) figure out why/if this is something we can avoid

I'm not fully sure who would be reviewing and deploying these changes. Maybe someone from the SRE team?

EDIT: from looking at the SRE tag, I would speculate that it is probably serviceops and related to Wikimedia-Apache-configuration, and maybe Prod-Kubernetes or MW-on-K8s ?

The overall functionality, that is expected after this is deployed, can be tested by going to <domain>/entity/E123 and being redirected to <domain>/wiki/EntitySchema:E123.

So, specifically,

That being said, there are two changes here, probably both should be deployed, and I'm unsure which effect each individual change has. But the above is the overall effect that we are looking for.

Change 912327 merged by RLazarus:

[operations/puppet@production] Handle Canonical URL for EntitySchemas

https://gerrit.wikimedia.org/r/912327

Mentioned in SAL (#wikimedia-operations) [2023-05-09T17:11:42Z] <rzl> rolling restart apache on codfw appservers T225778

Mentioned in SAL (#wikimedia-operations) [2023-05-09T17:17:58Z] <rzl> rolling restart apache on eqiad appservers T225778

Change 912326 merged by jenkins-bot:

[operations/deployment-charts@master] Handle Canonical URL for EntitySchemas

https://gerrit.wikimedia.org/r/912326

Looks good to me!

Thank so much :)