Page MenuHomePhabricator

linking Schemas in statements
Open, HighPublic5 Story Points

Description

As an editor I want to refer to existing Schemas in statements in order to make it easier to find and classify them. We need a new datatype to link to them from statements.

Example:
Q5: corresponding EntitySchema:E1234

BDD
GIVEN an Item
AND an Schema covering the same topic
THEN a new statement can be added that links the Item and the Schema

Acceptance criteria:

  • a new datatype exists that allows linking to Schemas
  • Only existing Entity Schemas can be selected as values
  • Link text should be the ID of the entity schema
  • The value of that link should be exported in RDF as a URI (schema is to be decided by Engineers)
  • "What Links Here" on an EntitySchema instance should list all entities linking to it through the new datatype

Notes:

  • In the search field, the schema ID is to be used (e.g. no search by label etc)

Details

Related Gerrit Patches:
mediawiki/extensions/Wikibase : masterAdd remaining i18n messages for EntitySchema data type
mediawiki/extensions/Wikibase : masterAdd EntitySchema data type

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 29 2019, 8:50 AM
Lydia_Pintscher renamed this task from Wikidata Ticket Template to linking Schemas in statements.Jan 29 2019, 8:51 AM
Lydia_Pintscher triaged this task as Medium priority.
Lydia_Pintscher raised the priority of this task from Medium to High.May 29 2019, 8:14 AM

We used to have an url property that was suitable for that .. not sure where it went.

ericP added a subscriber: ericP.Jun 3 2019, 7:03 AM

You might not want to add it to the data itself as people persuing different use cases will want to assert that the item is linked to different schemas. When I go to test whether a schema matches <SomeSpecialWidget>, I won't be interested in whether it matches everything else it's ever been used for, so at the very least, properties linking data to schemas are advisory and not required for the validation infrastructure.

Yeah but it'd be fine to add links to several Schemas to one Item or the same Schema to several Items.

Lydia_Pintscher moved this task from Incoming to Ready to estimate on the Wikidata-Campsite board.
WMDE-leszek updated the task description. (Show Details)Jun 4 2019, 1:02 PM
WMDE-leszek updated the task description. (Show Details)Jun 4 2019, 1:07 PM
WMDE-leszek updated the task description. (Show Details)
WMDE-leszek updated the task description. (Show Details)
WMDE-leszek updated the task description. (Show Details)Jun 4 2019, 1:18 PM
WMDE-leszek set the point value for this task to 5.

@alaa_wmde You mentioned that there exists a checklist for adding a new datatype. Could you link it here?

Simple solution could be to use items instead, see https://www.wikidata.org/wiki/Q64335281

Jheald added a subscriber: Jheald.Jun 14 2019, 9:45 AM

The community has now given the thumbs-up to Wikidata:Property proposal/Shape Expression for class, to link a class item to the Shape Expression that members of it should conform to.

It is now ready to go, as soon as it becomes possible to add statements with datatype 'EntitySchema'.

It would have been so much easier if EntitySchemas were entities, we just could have re-used the entity datatype boilerplate

We could use the https://www.wikidata.org/wiki/Q64335281 approach. Also solves the problem of structured data on EntitySchemas.

We could use the https://www.wikidata.org/wiki/Q64335281 approach. Also solves the problem of structured data on EntitySchemas.

That is a hack, if we need to have structured data on EntitySchemas, then entity schemas need to be entities as well.

Why is there such a "need"?

It's the standard approach for everything else in the WMF-wikiverse.

@alaa_wmde You mentioned that there exists a checklist for adding a new datatype. Could you link it here?

here's the doc about it in the repository:
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibase/+/refs/heads/master/docs/datatypes.wiki

We could use the https://www.wikidata.org/wiki/Q64335281 approach. Also solves the problem of structured data on EntitySchemas.

That is a hack, if we need to have structured data on EntitySchemas, then entity schemas need to be entities as well.

I see your point. But I think too much generalization can be at fault here and would lead to undesirable coupling on conceptual level between schemas and entities.

In this task, we only need to link to them so far. Theree's no clear need nor requirement for having structured data on EntitySchemas within Wikidata itself. That could change of course, and even then it would probably be cleaner for entity schemas to keep their own domain that does not inherit from Wikibase domain. Then Wikidata can represent those schemas with items to have some structured data on them .. this will allow entity schemas to come, be stored and develop freely from being bound to Wikibase DataModel (and therefore, pragmatically, its software tech stack & choices).

Jheald added a comment.EditedJun 25 2019, 10:14 AM

Agree with @alaa_wmde that no requirement's yet been made out for having structured data on EntitySchemas within Wikidata. But it would be useful to have the EntitySchemas themselves (not surrogate items for them) represented in WDQS, so that they can be queried. That is the subject of ticket T225701 "Add EntitySchemas to the Query Service". A federated service, rather than putting them in the main WDQS triplestore, would probably satisfy this.

Change 523980 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] [WIP] Add EntitySchema data type

https://gerrit.wikimedia.org/r/523980

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptJul 17 2019, 6:00 PM

Some open questions for me:

  • How the RDF output of this look like? Should we not do it for MVP and add it later? It can have treat it as "StringValue" RDF output as well (for now)
  • How formatting should look like? Right now, after saving, you will get "EntitySchema:E45", should we get the title? That would be really hard to implement
  • The same goes with suggester, Should it search based on the title? Implementing it is needed for the first iteration? It also might be complex but not as complex the above one

@Lydia_Pintscher ^

Some open questions for me:

  • How the RDF output of this look like? Should we not do it for MVP and add it later? It can have treat it as "StringValue" RDF output as well (for now)

@Lucas_Werkmeister_WMDE ^ thoughts? Otherwise go with StringValue.

  • How formatting should look like? Right now, after saving, you will get "EntitySchema:E45", should we get the title? That would be really hard to implement

For now E45 and later the title. We also need the title to show it in listings etc later.

  • The same goes with suggester, Should it search based on the title? Implementing it is needed for the first iteration? It also might be complex but not as complex the above one

"E45" as input needs to work in the first version. Title would be good in addition but can be done in a next step.

  • How the RDF output of this look like? Should we not do it for MVP and add it later? It can have treat it as "StringValue" RDF output as well (for now)

I would've thought first of using the schema's url as a concept URI and used it here, but didn't look into it in details if that would work/make sense.

  • How formatting should look like? Right now, after saving, you will get "EntitySchema:E45", should we get the title? That would be really hard to implement

there's an AC for this Link text should be the ID of the entity schema .. we should use the ID for now, so EntitySchema:E45 is totally fine.

  • The same goes with suggester, Should it search based on the title? Implementing it is needed for the first iteration? It also might be complex but not as complex the above one

Going with the ID is probably okay for MVP, right @Lydia_Pintscher? but if searching with label/description/alias is tiny effort (<2 hours) then why not, I'd do it.

  • How the RDF output of this look like? Should we not do it for MVP and add it later? It can have treat it as "StringValue" RDF output as well (for now)

I would've thought first of using the schema's url as a concept URI and used it here, but didn't look into it in details if that would work/make sense.

If schemas aren’t entities, we shouldn’t pretend otherwise in the RDF output either. Also, the Wikibase tradition is that /wiki/Q42 is not the URI of the entity, but only of its HTML representation. If we wanted to emit entity schemas as entities, we would need extra URLs analogous to Wikibase’s /entity/ and /wiki/Special:EntityData/.

Probably best to go with StringValue for now.

  • How formatting should look like? Right now, after saving, you will get "EntitySchema:E45", should we get the title? That would be really hard to implement

there's an AC for this Link text should be the ID of the entity schema .. we should use the ID for now, so EntitySchema:E45 is totally fine.

The ID would be E45, not EntitySchema:E45.

If schemas aren’t entities, we shouldn’t pretend otherwise in the RDF output either. Also, the Wikibase tradition is that /wiki/Q42 is not the URI of the entity, but only of its HTML representation. If we wanted to emit entity schemas as entities, we would need extra URLs analogous to Wikibase’s /entity/ and /wiki/Special:EntityData/.

They are not entities, but they can still be concepts in the "ontology" the RDF is trying to describe, can't they? but yeah string value is probably better to start with as it poses less requirements on us for maintaining new urls for them yet.

I tested this locally. It all works, although I noticed that there's no search functionality as described in the description. When I type E1, that does not trigger any search. But since users can only enter entities through their IDs, search here is of little value.

Change 523980 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add EntitySchema data type

https://gerrit.wikimedia.org/r/523980

ericP added a comment.Jul 30 2019, 8:36 PM

ShEx schemas have an RDF representation. which round-trips everything except comments and whitespaces. For example, compare 3circRefPlus1.shex to 3circRefPlus1.ttl or 1dotIMPORT1dot.shex to 1dotIMPORT1dot.ttl in https://github.com/shexSpec/shexTest/tree/master/schemas.

Tried it \o/
Two remaining issues I found:

Change 526770 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Add remaining i18n messages for EntitySchema data type

https://gerrit.wikimedia.org/r/526770

Change 526770 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add remaining i18n messages for EntitySchema data type

https://gerrit.wikimedia.org/r/526770

Jheald added a comment.EditedAug 5 2019, 7:19 PM

A couple of issues:

On the beta system, Pages that link to "EntitySchema:E1" isn't showing anything, despite Entity schema (P253078) having it as a value of property P253078

Secondly, can we confirm that, on WDQS, a property Pxxxxx with data type 'EntitySchema' (eg this proposal Wikidata:Property proposal/Shape Expression for class will lead to statements of the form

wd:Q5  wdt:Pxxxxx  wd:E10

with the wd:E10 being appropriately linked by the UI and turned into an appropriate URL in the download ?

If this is not the case, then the datatype is not ready to go. (Also, are these a reflection of a fundamental issue that the internal type ought to be "Wikibase entity id", not "string" ?)

Ok. Based on the comments here and in project chat let's flip the config switch for this to off and see what we can do about the raised points.

wd:Q5  wdt:Pxxxxx  wd:E10

I don't support reusing wd: prefix (T225778#5391086). We should define a new prefix like wdes:E123

@Bugreporter : T225778 "Define canonical URI for EntitySchemas" was opened a few weeks ago as a specific ticket for what the canonical URI should be, so I've added that in as a subtask for this ticket.

But, why is it that you object to wd:E10 (ie http://www.wikidata.org/entity/E10), similar to eg wd:L20 (ie http://www.wikidata.org/entity/L20) for a Lexeme ?

It refers to https://www.wikidata.org/wiki/Special:EntityData/E10, and Special:EntityData is served by Wikibase, but EntitySchema is not based on Wikibase.

Waiting on discussion regarding which direction to go with this.

Deadline added for the decision, 02.09.2019 end of work day.

@Lydia_Pintscher I added two entries to the AC for RDF URI output and for What Links Here functionality to be fixed. If that looks good on your end for releasing this feature, let's move it to Ready to Estimate.

TODO: Just wanted to highlight that once decisions are made... please ensure to update the Glossary item ! Currently it reads:

EntitySchema is a special type of Wikidata page containing a document in ShEx format, and related metadata. Although it may have labels, descriptions and aliases similar to items, it is not a type of entity, nor powered by Wikibase. Entities may be validated against an EntitySchema using a tool.

As a Data Architect in real life working with databases & entities, I actually appreciate and like the fact that EntitySchema is not a type of entity and as @alaa_wmde states, it decouples concepts and allows flexibility with multiple viewpoints from around the world. It also allows external publishers to express their own views and later link them and validate them. (not every entity/thing has to be stored in Wikibase, but allowing conceptual linking helps the world, so a canonical URI is a "good thing" and agree with @Jheald )