Page MenuHomePhabricator

Decide which domain to use for the wikibase ontology URI
Closed, ResolvedPublic

Description

The ontology uri we use with the rdf mapping of wikibase is http://www.wikidata.org/ontology#. Using the wikidata.org domain here wrongly implies that this is an ontology for the content of wikidata, while in reality it's a low level ontology for the scaffolding we use for our rdf mapping.

We, or someone, may actually provide an ontology for wikidata at some point. And we should make clear that the ontology applies to all wikibase sites.

Some options I can think of:

Other options? Thoughts?

I believe we should fix this, and we should fix it now, before we start providing official rdf dumps. The old URI could resolve to a redirect to the new one - though it currently doesn't resolve to anything.

Event Timeline

daniel created this task.Mar 19 2015, 3:36 PM
daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a subscriber: daniel.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 19 2015, 3:36 PM
daniel set Security to None.
daniel added subscribers: Lydia_Pintscher, Denny, mkroetzsch.
daniel added subscribers: Smalyshev, Manybubbles.
mkroetzsch added a comment.EditedMar 19 2015, 3:53 PM

Hi Daniel.

Good point, I agree that this should change. A URL based on wikiba.se seems to be the best. I don't think we need to worry about domain ownership here (why would anybody sell this domain? Is it not WMF-owned?)

I think it is not a good idea to change ontoogy URLs based on the version of the ontology. External tools will depend on the exact URI and you will thus soon be locked into the version string for all future. You can see this in FOAF, which has always been at "http://xmlns.com/foaf/0.1/" although it's not at "v0.1" these days. Basically, putting the version into your URIs is like putting the version number into every class name in your program in software versioning. I would suggest to maintain versioned ontology files (somewhere) and to define the version in the ontology header, but to keep the identifiers as they were.

P.S. Also, the easiest way for coin URIs is to use the # pattern that is used now. Switching to a slash-pattern requires you to implement content-negotiation on the server where the thing is hosted, but does not seem to have any benefit (note that you could also implement content negotiaton with the # if you ever wanted; the difference is that it's still correct even if you don't).

The domain wikiba.se is owned by WMDE, I think.

daniel added a comment.EditedMar 19 2015, 7:33 PM

Regarding the version in the ontology URI: I think this is actually very important, since it allows us to make breaking changes to the ontology. People checking for the exact URI is exactly what we want.

Fun fact: XSD uses the same URI for 1.0 and 1.1, but *completely* changed the interpretation of negative year numbers in xsd:date (shifting it by one year). Any consumer not aware of what version the file is actually using is going to get broken data. I would *really* like to avoid that situation.

However, this is only relevant for breaking changes. So perhaps we should only include *major* version numbers in the URI.

Regarding the version in the ontology URI: I think this is actually very important, since it allows us to make breaking changes to the ontology. People checking for the exact URI is exactly what we want.
Fun fact: XSD uses the same URI for 1.0 and 1.1, but *completely* changed the interpretation of negative year numbers in xsd:date (shifting it by one year). Any consumer not aware of what version the file is actually using is going to get broken data. I would *really* like to avoid that situation.
However, this is only relevant for breaking changes. So perhaps we should only include *major* version numbers in the URI.

+1

@daniel: Have you wondered why XML Schema decided against changing their URIs? It is by far the most disruptive thing that you could possibly do. Ontologies don't work like software libraries where you download a new version and build your tool against it, changing identifiers as required. Changing all URIs of an ontology (even if only on a major version increment) will break third-party applications and established usage patterns in a single step. There is no mechanism in place to do this smoothly. You never want to do this. Even changing a single URI can be very costly, and is probably not what you want if the breaking change affects only a diminishing part of your users (How many BCE dates are there in XSD files? How many of those were already assuming the ISO reading anyway?).

Having a version number in URLs does not solve the problem of versioning: it just creates an obligation for you to change *all* URIs whenever you update the ontology. If you really ever want to make a breaking change to one URI, then you just create a new URI for this purpose and keep the old one defined as it was. Then you stop using the old one in the data. Clean, easy, and usually without any disruption to 99% of the users (depending on which URI you changed of course ;-). This introduction of new names is completely independent of your ontology document version.

Besides all this, breaking changes are extremely rare. The example you gave (changing the meaning of an XML Schema datatype) does not apply to us, since we cannot do such things in our ontology. In essence, our ontology is just a declaration of technical vocabulary. Most changes you could make do not cause any incompatibility -- the Semantic Web is built on an open-world assumption so that additions of information to the ontology never are breaking anything. The only potentially breaking change to an ontology is when you delete some information, but even there it is hard to see how it should break a specific application.

Summing up, the only breaking change to an ontology is to change an important URI that many people rely on. The current proposal is to introduce a mechanism for doing exactly this.

@daniel: Have you wondered why XML Schema decided against changing their URIs? It is by far the most disruptive thing that you could possibly do.

Besides a breaking change in the interpretation, the meaning of a URI.

Ontologies don't work like software libraries where you download a new version and build your tool against it, changing identifiers as required. Changing all URIs of an ontology (even if only on a major version increment) will break third-party applications and established usage patterns in a single step. There is no mechanism in place to do this smoothly.

If you make a breaking change to the data model, third-party applications will break anyway when they try to consume the data. Better give them a way to tell.

You never want to do this. Even changing a single URI can be very costly, and is probably not what you want if the breaking change affects only a diminishing part of your users (How many BCE dates are there in XSD files? How many of those were already assuming the ISO reading anyway?).

If the affected group is very small, maybe. As to the ISO reading: in ISO 8601:1998, -1 is 1 BCE. in ISO 8601:2004, -1 is 2 BCE, see https://en.wikipedia.org/wiki/0_%28year%29#ISO_8601. And who is using the old interpretation anyway? Well, the Java standard libraries for example, at least according to the docs: https://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/datatype/XMLGregorianCalendar.html says it's 1.0. That's also what BlazeGraph uses.

Sure, there will be a lot less negative dates out there than positive. But this change broke all of them without any way to know which interpretation is intended when you see an xsd:date in a document.

So, which spec should we use for generating our output, xsd 1.0 / ISO 8601:1998 or xsd 1.1 / ISO 8601:2004?

Having a version number in URLs does not solve the problem of versioning: it just creates an obligation for you to change *all* URIs whenever you update the ontology.

Well, you change the base URI, yea.

If you really ever want to make a breaking change to one URI, then you just create a new URI for this purpose and keep the old one defined as it was.
Then you stop using the old one in the data. Clean, easy, and usually without any disruption to 99% of the users (depending on which URI you changed of course ;-).

That's exactly what having a version number in the URI does.

I would have preferred if XSD had just introduced a new data type, xsd:astroDate or something. That would have avoided the breaking change.

This introduction of new names is completely independent of your ontology document version.

Now I'm confused. To me the introduction of new names is the new ontology version.

Besides all this, breaking changes are extremely rare. The example you gave (changing the meaning of an XML Schema datatype) does not apply to us, since we cannot do such things in our ontology.

We don't? We may very well define our own data type for dates, since even for the gregorian calendar model, xsd:date doesn't cover all our needs.

In essence, our ontology is just a declaration of technical vocabulary. Most changes you could make do not cause any incompatibility -- the Semantic Web is built on an open-world assumption so that additions of information to the ontology never are breaking anything.
The only potentially breaking change to an ontology is when you delete some information, but even there it is hard to see how it should break a specific application.

Well, you can introduce a contradiction, that would "break" the ontology internally.

But worse, you can change the interpretation of the terms in the ontology, which will break applications, if produce and consumer of the data do not agree on the interpretation.

Summing up, the only breaking change to an ontology is to change an important URI that many people rely on. The current proposal is to introduce a mechanism for doing exactly this.

There is two dicussions here:

  1. change the ontology URI to reflect the fact that it is a wikibase ontology, not a wikidata ontology.
  2. include a version number in the ontology uri.

This ticket is about (1). Your arguments seem to be concerned mostly with (2), perhaps we should have a separate ticket for that. Do I understand you correctly that you agree that we should change the base URI to refer to wikibase, and do it very soon?

Lydia_Pintscher triaged this task as Normal priority.Mar 20 2015, 12:55 PM

@daniel It makes sense to use wikibase rather than wikidata, but I don't think it matters very much at all. We should just define it rather sooner than later.

As for the versioning, I don't see how to convince you. Four more attempts:

  • Try to apply your proposal to the MediaWiki API: "Every API action should contain the MW version number." I think from your experience with MW it should be easier for you to see why this would be a bad idea. RDF is the same, but it affects a lot more APIs.
  • Another argument is that, of course, changing URIs does not give users any warnings about the change either. Their queries will just return different results, but there won't be any error message or the like. This behaviour is exactly the same as for other kinds of breaking changes. You just add a new kind of breaking change that is sure to break everybody's usage (not just the users' who use BCE dates, to stay in your example), but the breakage is still subtle and hard to notice in a running system. URI versioning does not implement any kind of "fail fast" principle that you would want for announcing breaking changes. There is no standard way of announcing breaking changes via an RDF or SPARQL API; you need to work on your community communication to get this done (e.g., one could send notes about breaking changes well in advance to wikidata-tech and gather feedback).
  • You gave an example where a well-informed group of experts decided against your recommendation. I know of many other examples where URIs were initially created to contain a version number that was then never changed even after major updates (FOAF for example), again because experts in the field deemed that this was a sensible way to go. I would also claim some expertise in this area. Your view is natural for somebody who has not worked much with ontologies. Many smart people have thought similar ten years ago (you can see a lot of "0.1" and "1.0" version numbers in vocabulary URIs. Even the SMW ontology includes a version number in the URIs; of course it also never changed). Experience shows that the case where you would ever want to do such a drastic thing is the case where you use completely new URIs anyway (and probably give the project another name, too).
  • You could always decide to do the versioning later on if you must. There is no problem going from URIs http://wikiba.se/ontology#... to URIs http://wikiba.se/ontology-2.0.0#.... There is no standard way of encoding version information in URIs and you would not write a SPARQL query to extract it from there. However, in most cases, if you really change the meaning of one URI, you would rather use a new URI for this one thing only and keep all the other URIs as they are.

@mkroetzsch I think I see your point about "another breaking change" now: If a dataset used to contain objects using URIs based on http://acme.test/onto1, and that changes to http://acme.test/onto2, the same query against the dataset would return objects with completely different URIs, probably unrecognizable for the consuming system. If the URI would have stayed http://acme.test/onto1, the consumer would get URIs it understands, though the objects may look a little different.

My worry is about systems that interpret RDF URIs directly while ingesting the data. Let's say the triple store knows the XSD data types, and uses that knowledge for efficient indexing. How does such a system know how to interpret negative year numbers in xsd:date? There is *no* way for it to tell which interpretation is intended. And when ingesting xsd:date values from different sources that use different versions of xsd would lead to incorrect query results (xsd:date values that refer to the same date would not be equal, because they use a different convention for BCE years).

I find that quite scary, and I still do not understand why that well-informed group of experts went that way, instead of introducing new data types that explicitly use the astronomical numbering. To me, silently producing wrong results in 1% of the cases may be quite a bit worse than breaking compatibility. It depends on the use case, I suppose.

  • Try to apply your proposal to the MediaWiki API: "Every API action should contain the MW version number." I think from your experience with MW it should be easier for you to see why this would be a bad idea. RDF is the same, but it affects a lot more APIs.

As long as discoverability is guaranteed, this makes a lot of sense to me. Actually, many APIs do it that way.

@daniel Changing URIs of the ontology vocabulary is "silently producing wrong results" as well. I understand the problems you are trying to solve. I am just saying that changing the URIs does not actually solve them.

@adrianheine You are right. My example was less suitable than I had thought. The reason is that for an API, you can report an error if somebody uses the wrong actions. This is exactly what you cannot do in RDF. If someone uses the wrong URIs in a SPARQL query, he will just get the wrong results (or maybe, coincidentally, the correct results) without any warning.

@mkroetzsch: You say that changing the URIs does not solve the problem of breaking changes to an ontology. If I understand you correctly, you are saying that if I run a SPARQL query using the base URI http://acme.com/onto1, but the triple store now uses http://acme.com/onto2, I would get the wrong results without warning. While I agree that this is technically correct, I disagree with the conclusion.

You are right that I would not get an explicit error message like I would when trying to use a now undefined class name in a programming language. I would most likely simply get no result. That's however still better than getting a *wrong* result for a query using http://acme.com/onto1, because the ontology's definition changed, and so my code's assumptions about it are now wrong.

Also, when combining information using http://acme.com/onto1 and http://acme.com/onto2 from different sources in the same triple store, no contradictions arise, the structures can coexist nicely. If however I have info using old style http://acme.com/onto1, and another using new style http://acme.com/onto1, and mix them, I may very well get contradictions or other kinds of breakage, due to the changes to the ontology.

Finally, consider clients that use RDF data not via SPARQL, but directly as linked data. If they request RDF expecting URIs based on http://acme.com/onto1, but would get URIs based on http://acme.com/onto2, they would very quickly realize, investigate, and adapt. If they however would get http://acme.com/onto1 URIs with slightly modified meaning or structure, that will introduce hard to find bugs, and may lead to data corruption.

Am I missing a point somewhere?

This comment was removed by daniel.

Since T94747 has been decided now, I'll narrow the scope of this ticket to be just about the domain to use for the wikibase ontology.

daniel renamed this task from Better namespace URI for the wikibase ontology to Decide which domain to use for the wikibase ontology URI.Apr 9 2015, 12:23 PM
daniel updated the task description. (Show Details)
Dzahn added a subscriber: Dzahn.Apr 9 2015, 5:57 PM

I would personally find wikibase.org more appropriate than wikiba.se. I know, it's cool that it's short etc but it will also cause confusion in the long run. After all it's an .org thing and not a Swedish site, and you just expect that ending for WMF related domains. just my 2 cents.

I agree with the above - while wikiba.se is nice and clever, I feel .org URL would be more stable and appropriate for the task in the long run.

wikiba.se is the domain we actually have and use and will continue to use as the main website for the software.

Lydia_Pintscher raised the priority of this task from Normal to High.Apr 29 2015, 12:49 PM
daniel reassigned this task from daniel to Lydia_Pintscher.May 10 2015, 2:35 PM

So, have we settled for wikiba.se, then? If so, we should make a ticket for changing the URI in the RDF generator code. Assigning to Lydia for confirmation.

Yes. Let's do it.

daniel closed this task as Resolved.May 11 2015, 2:44 PM

So, to summarize: