Page MenuHomePhabricator

wikibase:geoGlobe IRI included in simple value geo:wktLiteral for non-Earth coordinates
Closed, DeclinedPublic

Description

See http://tinyurl.com/grkd7qw for an example query that returns the coordinates for Olympus Mons, a Martian volcano.

Raw Result

{
  "head" : {
    "vars" : [ "o" ]
  },
  "results" : {
    "bindings" : [ {
      "o" : {
        "datatype" : "http://www.opengis.net/ont/geosparql#wktLiteral",
        "type" : "literal",
        "value" : "<http://www.wikidata.org/entity/Q111> Point(18.4 226)"
      }
    } ]
  }
}

Event Timeline

Restricted Application added a project: Discovery. · View Herald TranscriptMar 7 2016, 12:50 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev closed this task as Declined.Mar 7 2016, 5:29 PM
Smalyshev added a subscriber: Smalyshev.

Yes, this is intentional.

Intentional or not., It is wrong. Why is it necessary? The problem is that it breaks parsing of geosparql literals. For example, if I ask for instance of volcanoes, I have to make exceptions for weird non-Earth coordinates.

It is not wrong, it is in accordance with the standard, please see: https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/repo/includes/Rdf/Values/GlobeCoordinateRdfBuilder.php#L57

Yes, you need to account in your parsing of literals that some coordinates are not on Earth. This is exactly the point of it - non-Earth coordinates are different from Earth coordinates, and we should not pretend they are the same.

Christopher added a comment.EditedMar 7 2016, 7:02 PM

Thanks for the clarification. However, the Req 10 of the geoSPARQL specification seems to be at odds with the definition of a "literal value". (According to https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal). The way that I read this specification is that a literal is a string defined by a IRI datatype. Concatenating the geoGlobe IRI to the coordinate string is very weird, and I do not think that there are very many implementations who can declare that a IRI/string is actually a "literal".

Literal value is not an URI, that is correct, and that value is a literal value. However, literal value can have type-defined semantics, and that semantics can - and in the case of this value does - have internal components, one of which can be URI. This semantics is external to RDF, because RDF can not really represent such rich semantics in one value, but it is not uncommon - in fact, even without URIs WKT has semantics of its own outside RDF.

Christopher added a comment.EditedMar 7 2016, 8:03 PM

Eh, http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf#wktLiteral is an RDFS Datatype so the semantics are defined by the RDF schema, right? But, I found this http://docs.opengeospatial.org/is/12-063r5/12-063r5.html that demonstrates that the WKT CRS extends far beyond RDF. I suspect that the implementation of wktLiteral is bound to RDFS, regardless of the "rich semantics" of WKT.

I'm not sure I understand what you mean. As you can see for yourself in http://schemas.opengis.net/geosparql/1.0/geosparql_vocab_all.rdf, there are literals that contain rich semantics - WKT, GML, etc. Those are serializations of complex objects, recorded in RDF as literal strings. Point + optional globe is one of such serializations, consistent with OpenGIS standard. It's pretty simple to parse, e.g. you can see example of parsing it here: https://gerrit.wikimedia.org/r/#/c/264854/4/blazegraph/src/main/java/org/wikidata/query/rdf/blazegraph/inline/literal/WikibaseGeoExtension.java

@Smalyshev have you tried to read the updated WKT CRS specification http://docs.opengeospatial.org/is/12-063r5/12-063r5.html yet? From what I can interpret, they have now deprecated the 2012 "non-ISO compliant" concatenation of a URI form of CRS and geometry.

Instead, the CRS string semantics are specified in section "WKT string form" which takes the format in KEYWORD1[attribute1,KEYWORD2[attribute2,attribute3]]. Note the use of an UPPERCASE keyword.

So, a WDQS non-earth coordinate may be specified like: "IMAGECRS["crs name"], POINT[18.4 226]"^^ogc:wktLiteral. Also, this variation of the wktLiteral datatype could be in a different namespace from the geo:wktLiteral and could then be easily filtered by the map parser.

I suggest possibly considering this to fix T130428 and other globe variant issues.

I think those are two different things. The spec you are quoting is the description of coordinate systems in WKT. However, what we have is not description of the coordinate system, but description of a point in some coordinate system which we do not specify (and really don't want to - I have no idea what coordinate systems there are on Mars) but just name it when we specify the point.

Please see geoSPARQL CRS design is debatable from the W3C Coordinate Reference System website.

Also, #7 here: the conflation of CRS with with the WKT in a literal has many undesirable effects

From the 12-063r5 document:

The WKT representation of coordinate reference systems as defined in ISO 19125-1:2004 and OGC specification 01-009 is inconsistent with the terminology and 
technical provisions of ISO 19111:2007 and OGC Abstract Specification topic 2 (08-015r2), “Geographic information – Spatial referencing by coordinates”.

Is this clear? They are admitting that the previous design form of WKT is inconsistent with other specifications. While it says nothing directly about the geoSPARQL specification, from page 4 of that 2012 design spec, WKT is defined through direct reference to the deprecated standard:

"as it is specified in Well Known Text (as defined by Simple Features or ISO 19125)"

What is outlined in this new specification is a de facto "WKT string form". And this form should accommodate all of the semantics of the geo:wktLiteral string format, including geometries (that are not explicitly mentioned in the new specification). Admittedly, this is quite a difficult thing to sort out, and there is definitely politics and big money at work in the standards process. If you want something more concrete to build on, maybe I ask them for a concrete geoSPARQL "best practice guideline".

Finally, I do not feel that it is accurate to use a Wikidata entity (e.g. Mars) as a CRS by the design definition of geo:wktLiteral and then not properly specify it. It is probably better to just omit it and use a different data type for non-earth coordinates than to provide an entity concept URI as a CRS identifier.

daniel added a subscriber: daniel.EditedMar 21 2016, 12:35 PM

@Christopher I don't see why we shouldn't use wikidata URIs as CRS identifiers. However, I agree that Q2 and Q111 are too unspecific to be used as a coordinate system. We would want to use something like Q11902211.

This points out a problem in our use of the "reference globe" in the GlobeCoordinate data type: the intention was to specify the exact datum used. But in practice, this is often not known to the people who enter the coordinates, and not stated in their source either. Also, when querying by location, we typically want to filter by celestial body (earth, moon, mars, etc), not by coordinate system.

I do not think we should use a different data type for non-earth coordinates. We would need to duplicate all coordinate properties (location-earth, location-mars, etc). We might however choose to allow the globe and the CRS to be specified separately in the GlobeCoordinate data value. That will make it even more difficult to represent the value meaningfully with a single literal for the simple mapping.

With regards to WKT: so they are completely changing the string form? And implementations don't fully support the old nor the new form? I'm getting more and more inclined to just ditch WKT.

Finally, I do not feel that it is accurate to use a Wikidata entity (e.g. Mars) as a CRS

It is not 100% accurate as that URL does not contain (currently) description of CRS but it is the best option we have. At least it does define which planetary body (and by inference, which CRS if any is the customary one for that body) we refer to.

so they are completely changing the string form

I don't think so. Definition of point and of CRS are two different things. Which use different formats. But we should not confuse between them. See also: https://en.wikipedia.org/wiki/Well-known_text

Christopher added a comment.EditedMar 21 2016, 9:45 PM

@Smalyshev so, by stating that geometry and CRS are different, you then concur with the main arguments referenced above that they should not be conflated in a simple literal. @daniel I agree with the idea of specifying the CRS as an additional component of the GlobeCoordinate data value separately from the geometry.

I do not agree that a Wikidata entity can be inferred to be a CRS without it providing or pointing to a serialization that can be validated against a known CRS encoding (z.B. gml:GeodeticCRS). Stating that a CRS is an instance of a "geodetic reference system" is only a concept pointer, and does not provide the syntax of a CRS schema which is necessary for a software to understand the meaning of a geometry.

In summary, these are the reasons why a CRS should not be represented as a URI in a simple WKT literal string (that contains point geometry).

  1. geometry and its CRS are just two separate things
  2. it becomes much harder to use the CRS as a filter in a SPARQL query
  3. it is not possible to assign multiple CRS specifications to a geometry
  4. this form of WKT literal string is no longer standard WKT
  5. The CRS is a URI, so it should be published as one
  6. It is not possible to assign a CRS to a collection of geometries (e.g. a dataset)
  7. Software libraries that handle WKT geometry do not expect a CRS as start of the string

The current use of simple geo:wktLiteral for WGS84 points is fine, but if a Wikidata goal is to introduce more complex GIS spatial data (which I think would be very worthwhile), then the implementation should adhere to justifiable and reasonable standards for the data representation.

Smalyshev added a comment.EditedMar 21 2016, 10:19 PM

I think WKT literal should specify the coordinate system, otherwise it becomes essentially useless - you can not do any queries on in if the literal may mean something on Earth or something on the Moon. If you query "give me items in within 50-mile radius of Paris" and get a moon crater - this is useless. Thus, the literal should allow to distinguish between them. This has nothing to do with defining CRSes, in which we have no business, or with definiting geometries, which we don't do either.

Proper software libraries that handle WKT should also be able to handle requirements of OGC standards. If they do not, they can't handle non-Earth coordinates anyway, so it's better for them not being able to parse them than parse them and think moon crater is in Paris. Explicit failure is always better than garbage data.

if a Wikidata goal is to introduce more complex GIS spatial data

We'll cross that bridge when we get to it.

Coincidentally, it seems that there are people who know a lot more about this than I do that have debated this issue at length in a long and very informative thread:
CRS specification (was: Re: ISA Core Location Vocabulary)

It is clearly more involved than just using "proper" software libraries and "handling requirements". The conflicting point that I see from your side is that introducing unneeded complexity is bad. And, also, that this was the best practical alternative available for handling "garbage data". Sure, I agree with that, but oversimplification of a problem is worse. I personally feel that the "grunt approach" of using regex in SPARQL to filter URIs from literals in a result set is not clean, and also quite costly.

The alternative that introduces subproperties for geometry values is definitely more complex and as indicated in the thread:

You need OWL 2 to formally define a complex class and then say that geometry consists of exactly two parts, one that contains the coordinate
sequence and another on that contain the CRS. You cannot make such statements using RDFS or OWL.

To assert that everything that needs to be said about a geometry value can be put into a standard RDFS string literal is obviously not true. The irregular form of geo:wktLiteral is a kind of "convenience method" that seems to work for most use cases, but definitely not for all, and I really doubt that it is functionally sustainable for complex geodata.