Page MenuHomePhabricator

Make decision on RDF ontology prefix
Closed, ResolvedPublic

Description

We need to make decision on which URL we're using in our ontology:

  1. http://www.wikidata.org/ontology#rank (no version)
  2. http://www.wikidata.org/ontology-1#rank (major version)
  3. http://www.wikidata.org/ontology-1.0#rank (minor version)
  4. http://www.wikidata.org/ontology-1.0.0#rank (patch version)

(Note that following the SemVer convention, the meaning of 0.x versions is "shifted", so 0.2 is a major/breaking version, 0.2.1 is a minor/feature release, and 0.2.1.3 is a patch release).

Let's discuss and arrive to some definite conclusion so we could move forward towards finalizing the format.

Event Timeline

Smalyshev created this task.Apr 1 2015, 8:09 PM
Smalyshev raised the priority of this task from to High.
Smalyshev updated the task description. (Show Details)
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 1 2015, 8:09 PM
daniel updated the task description. (Show Details)Apr 2 2015, 9:23 PM
daniel set Security to None.
daniel added a comment.Apr 2 2015, 9:28 PM

I believe that when we need to make a breaking change to the ontology, the URIs of the affected resources should change. Including the major version id in the URI would make this easy, and would force consumers to pay attention to the change.

I don't expect us to make breaking changes often, of ever, after the 1.0 release. But before that, we may need several 0.x iterations before things settle down. When they settle down, we should not miss the right time to make the step to 1.0.

I see no value in changing ontology URIs for feature or bug fix releases. That would only cause needless hassle.

We could also omit the version ID for now, and if we ever need to make a breaking change, add it only then. That would work just as well, but including the major version ID from the start would perhaps make this less surprising for consumers.

daniel updated the task description. (Show Details)Apr 2 2015, 9:29 PM
Smalyshev updated the task description. (Show Details)Apr 3 2015, 12:15 AM
Smalyshev added a comment.EditedApr 3 2015, 12:18 AM

@daniel while I agree on the changing URLs point, note that his does not mean we necessarily have to have versions. That depends on how often we're doing to do this. If we're going to do this once or twice, we could just change main URL. If we foresee this happening with regularity, then we need version numbers.

I like the idea of not adding number now and making it ontology2 or something like that if we break it. If nobody objects to it then https://gerrit.wikimedia.org/r/#/c/200274/ implements it.

I agree with the proposal of @Smalyshev.

@Smalyshev: You said "If we're going to do this once or twice, we could just change main URL. If we foresee this happening with regularity, then we need version numbers."

I don't understand the difference between these two options. Including the (major) version is just a special case of changing the "main" URI prefix.

I'm fine with not including the version now; but we need to add a version (or otherwise change the URI) as soon as we make any breaking change. Since we are still experimenting, we make a breaking change every week, and there is no indication in the URIs if or when the schema can be considered schema, and we commit to compatibility from that point on.

So, if we don't want to have 0.1, 0.2, etc in the URL while experimenting, we could at least add "-beta" for now, to make it clear that this is not yet a stable schema. "-beta" would also give a strong reminder to officially end the experimental phase at some point, and not get stuck on 0.2.5 or something.

@daniel Changing the base URIs is not working as a way to communicate breaking changes to users of RDF. You can change them, but there is no way to make users notice this change, and it will just break a few more queries. It's just not how RDF works. Most of our test queries do not even mention any wikibase ontology URI, yet they are likely to be broken by changes to come. If you think that we need a way to warn users of such changes, you need to think of another way of doing this.

Here is how I would do ontology and RDF model versioning:

  • Ontology URIs are never changed. They are all based on the same base URI (prefix).
  • If an update would change the meaning of an ontology element in fundamental ways, a new URI (new local name) is used rather than redefining an existing URI.
  • The ontology needs to have a file (with OWL property and class declarations, some labels, etc.). The file name of this file (and thus its URL) should include a version number.
  • The ontology URIs should redirect to the most recent version of the ontology file. An improved setup would use content negotiation to redirect to an HTML documentation or to the ontology file. One can use URL fragments for local names in both cases.
  • The versioned ontology file is imported into the data dump using an OWL import statement.
  • In addition to the ontology import triple, the data contains a triple that defines the version of the RDF model used. This triple uses an annotation property to the dataset ontology element (i.e., the subject of the import triple, not the Wikibase ontology file).

Then users can query for the RDF model version to check compatibility. Like in Daniel's proposal, there is no warning if they don't do this check, but in contrast to Daniel's proposal, users have a way to find out which version of the model is used, and the versioning can refer to the whole RDF model (not just to the Wikibase ontology, which most of our current example queries are not even referring to).

Let me see if I get this right, there's some new stuff for me.

So in this model, we keeping the URL we're using for wikibase: as http://www.wikidata.org/ontology# - right? The URL http://www.wikidata.org/ontology would redirect on the server to something like http://www.wikidata.org/ontology-1.0.owl (or maybe with option for HTML depending on headers, etc.) where 1.0 is replaced with latest version.

The dump would use something like:

wikibase:Dump a schema:Dataset ;
    schema:softwareVersion "0.0.1" ;
    owl:imports <http://www.wikidata.org/ontology-1.0.owl>

not sure if I used right property and URL here.

In addition to the ontology import triple, the data contains a triple that defines the version of the RDF model used. This triple uses an annotation property to the dataset ontology element (i.e., the subject of the import triple, not the Wikibase ontology file).

This part I'm not sure how to represent.

I agree that we should have the software version as well as the OWL schema URL in the dump, and that the base URI should redirect to the appropriate schema (the unversioned URI should redirect to the latest schema).

@mkroetzsch: You said "Changing the base URIs is not working as a way to communicate breaking changes to users of RDF. You can change them, but there is no way to make users notice this change, and it will just break a few more queries. It's just not how RDF works."

I don't follow - it's true that there is no way to make all queries involving the old URIs fail, but I think clients will notice quickly in most cases, see my comment on https://phabricator.wikimedia.org/T93207#1189031

You said further: "Most of our test queries do not even mention any wikibase ontology URI, yet they are likely to be broken by changes to come. If you think that we need a way to warn users of such changes, you need to think of another way of doing this."

If the query doesn't use the namespace in question, and neither does the result, why would the ontology change be relevant to that query? Or do you mean the case when the query doesn't, but the result is expected to be using the "old" URIs, but nothing of that sort would be found? I expect that client code would fail because their assumptions about the URIs to be found in the result would fail, quickly and explicitly.

I'm very worried about subtle changes that go unnoticed by clients. They can introduce data corruption. Importing dates negative years from multiple sources, without any indication whether that sources uses XSD 1.0 or XSD 1.1, seems a very good example of what can go wrong, and why it is horrible.

Ok we discussed this some more. If there is no strong disagreement let's go with the following:

  • We will use http://www.wikidata.org/ontology-beta#rank now until we are reasonably sure the ontology is as we want it to be.
  • We will use http://www.wikidata.org/ontology#rank after this.
  • We will evaluate on a case-by-case basis if we need to version the whole ontology or just parts if we have a relevant breaking change in the future. (Assumption being that we should lean towards not doing it and if we do it use the year as a version.)
daniel closed this task as Resolved.Apr 9 2015, 2:35 PM
daniel claimed this task.