Page MenuHomePhabricator

Wikibase Request for Comment: Wikibase bootstrap properties set
Closed, ResolvedPublic

Description

Goal

To reduce the entry barrier to Wikibase with a first minimal ontology as a «preload» before the main project ontology.

Context

I'm finding many WB instances need to create properties (and probably a few items) as a first step before uploading or developing the ontology for their case. I'm not an ontology expert and I would need help to identify a very minimal set available for Wikibase projects elsewhere.

Here we provide a first draft as a reference. Consider it can be very wrong and susceptible of modifications.

First draft

idLenData typeDensameAs on Wikidataformatter URL
P1instance ofitemP31
P2subclass ofitemP279
P3usage instructionswb:Stringtext describing how to use a property or item.P2559
P4sameAsitemowl:sameAsP2888
P5sameAs on WikidataWikibaseExternalIdowl:sameAs for a Wikidata itemn/ahttp://www.wikidata.org/entity/$1
P6formatter URLwb:Stringweb page URL; URI template from which "$1" can be automatically replaced with the effective property value on items. If the site goes offline, set it to deprecated rank. If the formatter URL changes, add a new statement with preferred rankP1630
P7formatter URI for RDF resourcewb:Stringformatter URL for RDF resource: URI template from which "$1" can be automatically replaced with the effective property value on items (it is the URI of the resources, not the URI of the RDF file describing it)P1921
P8equivalentPropertywb:URLowl:equivalentPropertyP1628
P9equivalentProperty in WikidataWikibaseExternalIdowl:equivalentProperty for Wikidatan/ahttp://www.wikidata.org/entity/$1
P10media in Commonswb:CommonsMediaidentifier for a resource hosted in Wikimedia CommonsP18,P10
P11wikimedia language codewb:Stringidentifier for a language or variant as used by Wikimedia projectsP424

Event Timeline

I would request the following additions: (1) an addition/extension for WikibaseQualityConstraint items/properties; (2) something akin to the SSSOM (https://mapping-commons.github.io/sssom/spec-intro/) for better mapping specifications, and (3) some basic SKOS properties (https://www.w3.org/TR/skos-reference/).

One thing to consider, for importing/exporting ontologies developed using Wikibase, is how Wikidata makes use of punning to have simultaneous instance/class entities (which would make any exports only compliant with OWL 2 DL, assuming that a conversion process happens between the base Wikibase ontology and more typical OWL ontologies-- otherwise there is some wonkiness). For example, in lgbtDB, I am attempting a build with no punning at all, which often leads to splitting up Wikidata entities across multiple mappings.

@Superraptor123, could you please provide more details for each case? If you find it appropiate, create their own new tickets.

Thanks!

About importing/exporting ontologies, I agree is a very important issue but, personally, today it's outside my skills. Maybe it deserves it's own ticket also, with all the technical details you could provide.

One thing to keep in mind wrt. the property IDs: I think there's been a nice convention to make your P1 mean "sameAs on Wikidata" (or "Wikidata mapping") because after this one assumption, the rest of the correspondences can be deduced automatically based on it: https://lists.wikimedia.org/hyperkitty/list/wikibaseug@lists.wikimedia.org/message/J7VMGZ7XQBJ4SR25WQYY3Q35R7EI5DSN/

Another thing you see in the link is that it can be preferred to use the datatype URL instead of ExternalId. I suppose there are upsides and downsides that we could evaluate.

Maybe being the first has more sense. No problem with that.

About the datatype, to me, it's clear it's an external id. But I would like to hear other people with more experience in LOD and OWL than me about best practices.

@Superraptor123, could you please provide more details for each case? If you find it appropiate, create their own new tickets.

Thanks!

About importing/exporting ontologies, I agree is a very important issue but, personally, today it's outside my skills. Maybe it deserves it's own ticket also, with all the technical details you could provide.

In terms of the ontology import/export, I have a proof-of-concept up and running here: https://github.com/Superraptor/wikiodk (granted there are still some things to work out, but it works ok!).

I think the reason that the mappings are kind of an issue is because the "ontology" of Wikibase is nested within another ontology already (the Wikibase ontology, https://www.mediawiki.org/wiki/Wikibase/DataModel) which means, for example, there are no real classes in Wikibase, rather everything is an instance (for more information, I think this paper outlines the differences well: https://ceur-ws.org/Vol-3262/paper6.pdf).

This means that certain equivalences using OWL can lead to unintended consequences. Using SSSOM (or other meta-mapping) can assist in making sure OWL logics don't lead to any funny business. Wikidata also often mixes SKOS/OWL logic, which is still an open area of debate amongst ontologists (even if the debate hasn't really reach consensus since 2006: https://www.w3.org/2006/07/SWD/SKOS/skos-and-owl/master.html).

@Superraptor123, uff, it's a lot of stuff to fully understand :-|

About SSSOM, do you know a practical case to learn from how to use?

Looking at mp-hp-exact-0.0.1.sssom.tsv looks as reasonable as the adhoc solutions people is using out there. Are you familiar with the SSSOM tools and if they are ready to be used with Wikibase?

One thing to keep in mind wrt. the property IDs: I think there's been a nice convention to make your P1 mean "sameAs on Wikidata" (or "Wikidata mapping")

Related: Why should a Wikibase’s first property be "SameAs"?

Related: T234943, bootstrap Wikibase installation with basic ontology.

@Superraptor123, uff, it's a lot of stuff to fully understand :-|

About SSSOM, do you know a practical case to learn from how to use?

Looking at mp-hp-exact-0.0.1.sssom.tsv looks as reasonable as the adhoc solutions people is using out there. Are you familiar with the SSSOM tools and if they are ready to be used with Wikibase?

@Olea so so sorry for taking so long to understand. Academia in the States is... complex right now.

Regardless; yes! I have been using the SSSOM tentatively on my Wikibase.Cloud instance lgbtDB. For several examples, see: https://lgbtdb.wikibase.cloud/wiki/Item:Q20660, and for some preliminary documentation see: https://lgbtdb.wikibase.cloud/wiki/Project:Policies_specific_to_mappings. It's still definitely a work-in-progress, but I think it fits best to use references to contain the mapping metadata. Currently, on Wikidata, there is very little mapping metadata and individual items have varying degrees of "fuzziness": for example, The Advocate page (https://www.wikidata.org/wiki/Q752361) is simultaneously the magazine, the organization, and the website. In lgbtDB, I split those entities into different pages, so a simple "sameAs" to map to Wikidata wouldn't really work.

As another example, the Library of Congress uses some ontology-weirdness to map concepts to the real-world objects that are represented. So, for example, there is the "concept" of Havelock Ellis (https://id.loc.gov/authorities/n78087633) and the real-world person that was Havelock Ellis (http://id.loc.gov/rwo/agents/n78087633). This isn't even to mention the BIBFRAME ontology and how it doesn't align with FRBR and other European vocabularies to represent library materials (but that's an aside). For my use, it is important to make sure mappings are clarified so that external reasoners (https://en.wikipedia.org/wiki/Semantic_reasoner) do not break when trying to infer logical consequences from the ontology.

Unfortunately, I'm really only scratching the surface here-- as there is the ontology as represented to the user in a Wikibase instance, and the actual behind-the-scenes ontology representation (https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/docs/ontology.owl) which you can only see when exporting as OWL. This essentially means that the ontology the user sees is a "meta-ontology" of instances (there are no "real" classes in Wikibase instances, on the back-end at least, which is both a blessing (nothing a user on the front-end can do can "break" the ontology or make it impossible to dump) and a curse (mappings to/from Wikibase instances are necessarily lossy).

Currently, the Library of Congress gets around this issue by only using skos:closeMatch to map to Wikidata, but Wikidata uses "exact match" kind of loosely where it shouldn't be. SSSOM at least adds a little bit of crucial metadata to understand what mappings are being made and how, so that if external reasoners want to use mappings in Wikidata, they can parse through them somewhat more logically to avoid breakage. More on the SSSOM has been published here: https://doi.org/10.1093/database/baac035, but I think the abstract makes a major use case in re any medical information: "relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction)".

@Superraptor123 nothing to be sorry :)

About SSSOM, I've found there is a python package including a cli tool. I'm interested (T363820) in the workflow tsv -> ttl? -> Wikibase. Did you do it like this?

@Olea that is definitely a possibility-- at this stage I am more focused on the formatting of the various SSSOM-property-equivalents in Wikibase rather than their export, and currently I load the mappings manually. But it would absolutely be possible to automate the process using TTL --> Wikibase, would just require a Python script, likely using wikibase-integrator. Could definitely build it on top of the SSSOM Python package. I know Nico as a professional colleague (the primary designer/maintainer of the SSSOM) so it's possible to ask for additional documentation/features if we need!

My thoughts on the initial proposal; I've done some soul-searching regarding "minimalist" and have slimmed down to what I think works:

"Secondary" potential entities (outside of "minimalist")

Maybe an additional CIDOC CRM mapping at some point in the future for GLAM institutions since it's so widely utilized (see: https://cidoc-crm.org/html/cidoc_crm_v7.1.3.html).

@Superraptor123 what name should we use for these selection? WB bootstrap property set? Something better?

The last revision in table format:

LenData typeDescriptionskos:exactMatch on Wikidataformatter URL
instance ofitemP31
subclass ofitemP279
subproperty ofitem-P1647
exact matchitemskos:exactMatchP2888
Wikidata Entity IDWikibaseExternalIdused to indicate a mapping (of some undefined nature, but can be further specified by individual Wikibases) from a Wikibase Item, Property, Lexeme, etc. to Wikidatan/ahttp://www.wikidata.org/entity/$1
formatter URLwb:Stringweb page URL; URI template from which "$1" can be automatically replaced with the effective property value on items. If the site goes offline, set it to deprecated rank. If the formatter URL changes, add a new statement with preferred rankP1630
Olea renamed this task from Wikibase Request for Comment: essential minimalist ontology to Wikibase Request for Comment: Wikibase bootstrap properties set.May 4 2025, 11:42 AM

@TuukkaH would you dare to format the proposal in a SSSOM tsv file?

@Superraptor123 would this mapping be correct?

instance ofrdf:type
subclass ofrdfs:subClassOf
subproperty ofrdfs:subPropertyOf

Update:

LenData typeDescriptionskos:exactMatch on Wikidataformatter URL
instance ofwb:Itemrdf:typeP31
subclass ofwb:Itemrdfs:subClassOfP279
subproperty ofwb:Itemrdfs:subPropertyOfP1647
exact matchwb:Itemskos:exactMatchP2888
Wikidata Entity IDwb:ExternalIdused to indicate a mapping (of some undefined nature, but can be further specified by individual Wikibases) from a Wikibase Item, Property, Lexeme, etc. to Wikidatan/ahttp://www.wikidata.org/entity/$1
formatter URLwb:Stringweb page URL; URI template from which "$1" can be automatically replaced with the effective property value on items. If the site goes offline, set it to deprecated rank. If the formatter URL changes, add a new statement with preferred rankP1630

Open question: should we add a «formatter URI for RDF» property? This is the practice @DL2204 suggests (example).

Olea moved this task from Backlog to Ongoing on the VerySmallGLAM board.

Some comments:

  • I agree that skos:exactMatch is more appropriate than owl:sameAs for expressing that two URI describe the same entity, because owl:sameAs entails that also the statements made about the two URI are interchangeable (hold true for the two, which is most often not the case), while skos:exactMatch "only" asserts that the two URI describe the same entity. Moreover, owl:sameAs is used in Wikibase for "merged" entities redirects.
  • I agree that one wikidata entity property is enough for matching any kind of Wikibase entity to Wikidata (no need for separate mapping properties for item, property, lexeme, sense, form). However, the formatter URL for that prop has to be http://www.wikidata.org/entity/$1 (that is, pointing to the concept URI as if it where formatter URI for RDF, and not to the human readable page URL), since page URL have different formats depending on the entity type, e.g. wiki/Item:, wiki/Lexeme:, wiki/Property:).
  • formatter URI for RDF (P1921) is nice to have in addition to formatter URL (so that you can get full URI easily using the equiv to wdtn prefix on a Wikibase)

Update:

LenData typeDescriptionskos:exactMatch on Wikidataformatter URL
instance ofwb:Itemrdf:typeP31
subclass ofwb:Itemrdfs:subClassOfP279
part ofwb:Itemschema:isPartOfP361
subproperty ofwb:Itemrdfs:subPropertyOfP1647
exact matchwb:Itemskos:exactMatchP2888
Wikidata Entity IDwb:ExternalIdused to indicate a mapping (of some undefined nature, but can be further specified by individual Wikibases) from a Wikibase Item, Property, Lexeme, etc. to Wikidatan/ahttp://www.wikidata.org/entity/$1
formatter URIwb:Stringweb page URI; URI template from which "$1" can be automatically replaced with the effective property value on items. If the site goes offline, set it to deprecated rank. If the formatter URI changes, add a new statement with preferred rankP1630
formatter URI for RDFwb:Stringformatter URL for RDF resource: URI template from which "$1" can be automatically replaced with the effective property value on items (it is the URI of the resources, not the URI of the RDF file describing it)P1921

@DL2204: added schema:isPartOf and «formatter URI for RDF».

If there is not other opinion I would consider the selection done.

About this table formalism, do you find any problem?

For the record; WikidataCon CFP ends 1st of September.

@Addshore let me a question, how could be implemented this set for the WikibaseManifest?

Update:

I strongly suggest to include the local entity id into the recommendation. The main reason is practical. The OpenRefine Reconciliation Service for Wikibase, particularly it's docker image, expects P1 and P2 as rdf:type and rdfs:subClassOf. It can be interpreted as an arbitrary implementation decision but, my reasoning:

  • to look for standardization to simplify the Wikibase adoption;
  • OpenRefine is a key tool for many data intensive users
  • and this numbering looks very reasonable.

How do you see it? Do you know other cases where the property ID be a hard requirement like this?

The table would look like this:

entity IDLenData typeDescriptionskos:exactMatch on Wikidataformatter URL
P1instance ofwb:Itemrdf:typeP31
P2subclass ofwb:Itemrdfs:subClassOfP279
P3part ofwb:Itemschema:isPartOfP361
P4subproperty ofwb:Itemrdfs:subPropertyOfP1647
P5exact matchwb:Itemskos:exactMatchP2888
P6Wikidata Entity IDwb:ExternalIdused to indicate a mapping (of some undefined nature, but can be further specified by individual Wikibases) from a Wikibase Item, Property, Lexeme, etc. to Wikidatan/ahttp://www.wikidata.org/entity/$1
P7formatter URIwb:Stringweb page URI; URI template from which "$1" can be automatically replaced with the effective property value on items. If the site goes offline, set it to deprecated rank. If the formatter URI changes, add a new statement with preferred rankP1630
P8formatter URI for RDFwb:Stringformatter URL for RDF resource: URI template from which "$1" can be automatically replaced with the effective property value on items (it is the URI of the resources, not the URI of the RDF file describing it)P1921

I'm creating a first draft of the OWL spec for the bootstrap set:

Could you please check it for inconsistencies?

In this draft you'll find skos:note values looking weird. This is because I'm using a particular toolchain for later importing to Wikibase. So no worry for this.

Thanks!

For the record, here is where wbstack implements the feature «Import base entities» from Wikidata into a new wb.c instance: https://github.com/wbstack/ui/blob/main/src/backend/api.js#L101

This could be an excellent application for the wikibase-bootstrap.

In this last table, I see two issues:

  1. Datatype of P5 "exact match" (Wikidata P 2888) is URL not Item
  2. The name of P7 should be "formatter URL" not "URI" (as for Wikidata P 1630)

In this last table, I see two issues:

  1. Datatype of P5 "exact match" (Wikidata P 2888) is URL not Item

Oh, I see. Ok.

  1. The name of P7 should be "formatter URL" not "URI" (as for Wikidata P 1630)

Well, i'm more of the «Uniform Resource Identifier» school here 😇

The table would look like this:

entity IDLenData typeDescriptionskos:exactMatch on Wikidataformatter URL
P1instance ofwikibase:Itemrdf:typeP31
P2subclass ofwikibase:Itemrdfs:subClassOfP279
P3part ofwikibase:Itemschema:isPartOfP361
P4subproperty ofwikibase:Itemrdfs:subPropertyOfP1647
P5exact matchwikibase:Urlskos:exactMatchP2888
P6Wikidata Entity IDwikibase:ExternalIdused to indicate a mapping (of some undefined nature, but can be further specified by individual Wikibases) from a Wikibase Item, Property, Lexeme, etc. to Wikidatan/ahttp://www.wikidata.org/entity/$1
P7formatter URIwikibase:Stringweb page URI; URI template from which "$1" can be automatically replaced with the effective property value on items. If the site goes offline, set it to deprecated rank. If the formatter URI changes, add a new statement with preferred rankP1630
P8formatter URI for RDFwikibase:Stringformatter URL for RDF resource: URI template from which "$1" can be automatically replaced with the effective property value on items (it is the URI of the resources, not the URI of the RDF file describing it)P1921

The new draft:

For the record, seems the full workflow works and ttl2wb.py is able to upload the bootstrap set into a clean WB (example). You can try it by yourself if you wish.

For the record: this is the draft proposal for WikidataCon.

For the record: the proposal was not selected :-/

For the record:

  • I'm doing practical testing for the implementation of CIDOC-CRM on Wikibase, and it's going nice.

Well, now I can say I have a method/workflow for importing OWL ontologies to Wikibase using the tools and conventions made for the Wikibase Bootstrap. The test case is implementing CRM and the code is here: https://gitlab.wikimedia.org/olea/crm4wb

As a first feedback I found the need to include in the bootstrap set the owl:inverseOf property, since CRM has extensive use of this.

I'm the next days I'll be extending the CRM ontology with other vocabularies to support our use case, so, along the way, this will help me to continue validating the concept.

At this point I think the bootstrap can be considered stable and the extension on open development.