Page MenuHomePhabricator

Coordinate order in RDF is wrong
Closed, ResolvedPublic

Description

Currently, the coordinates in RDF are presented as WKT literal in latitude-longitude order, e.g. for Seattle (https://www.wikidata.org/wiki/Q5083) the representation is:

wdt:P625 "Point(47.6 -122.31666666667)"^^geo:wktLiteral ;

However, multiple sources suggest the default order used in most WKT implementations is longitude-latitude, e.g.:

https://blogs.msdn.microsoft.com/isaac/2008/03/05/the-upcoming-geography-coordinate-order-swap-a-faq/
https://portal.opengeospatial.org/files/?artifact_id=47664 sec 8.5.1
http://stackoverflow.com/questions/22536467/how-to-retrieve-latitude-and-longitude-from-geosparqls-wktliteral
http://www.macwright.org/lonlat/

So we may want to switch the order. This will require full DB reload and GUI fixes, so we may time it to coincide with other geospatial features deployment.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

OTOH, here:
https://en.wikipedia.org/wiki/ISO_6709

It says: "Latitude comes before longitude". and the same goes for wikidata.org screens where it displays in latitude-longitude.

Oh wow. Really?! What the actual fuck!(*)

If this standard is so broken that it's unclear where latitude and longitude go, we should switch to a different standard. I don't think just swapping latitude and longitude in the literals, with no indication in the RDF which is which, will make things better. It will only cause more confusion. Is there an alternative standard we can use to represent coordinates?

(*) technical term.

@daniel Unfortunately, I haven't found a better one so far. WKT seems to be the way to go if you want it as a short string. http://www.macwright.org/lonlat/ has a kind of survey of it. It looks like a huge mess (and we didn't even get into different coordinate systems, this is one more can of worms we don't even want to get near). I was going to propose GeoJSON, but the damn thing uses arrays too!

I'd like more research before we change this. So far no-one using the query service has complained about this afaict making me think it is fine the way it is.

The WKT standard is actually quite clear that order is longitude, latitude.
See http://www.geoapi.org/snapshot/javadoc/org/opengis/referencing/doc-files/WKT.html#AXIS
"If the optional AXIS terms are not present, then the default values are assumed:
Geographic Coordinate System (GEOGCS) AXIS["Lon",EAST],AXIS["Lat",NORTH (λ, φ)"

All known software, libraries, standards building on WKT properly use this order, including GeoSPARQL.

I understand that this still causes confusion, as e.g. https://en.wikipedia.org/wiki/ISO_6709 uses latitude, longitude. However, that is a different standard.
I also understand that so far people have not complained, but that may be due to the fact that there is only limited use so far, or that people simply created their own workarounds without complaining.

If Wikidata uses WKT, it should use it compliant with the standard.

Changing this would require full DB reload, so I would like to make it (if we do it) at the same time as we enable geospatial indexing (which will require full DB reload too, regardless). Which means probably sometime in April, if everything goes well, or later if we encounter problems. That would also mean we have to deploy the change before that, meaning running for certain time with mixed data. I'll make provisions on WDQS side to enable that. Since we'll have to do full reload anyway (and there's one more reason for it since we need to install new diskspace and reimage for better LVS anyway) we may as well use this opportunity to clean it up. It will need to be done very carefully and while we are in transition there may be some glitches but I think if we are going to do maintenance anyway it's a good opportunity.

I don't think many use coordinate data from Wikidata dumps in WKT as is right now. However, if we claim WKT, we should do as WKT implementations do, or claim different format. And earlier we do it, the better - with the confines of the above said. Either that or choose some other format. I know it's unfortunate we have to go through all this trouble, especially given that *right now* nobody cares, but if we have to do it later when people start noticing I think it would be worse.

So my plan would be to make it so that we make decision about switching ASAP (I recommend switching, but we need to reach consensus here I think) and then implement and commit it before end of March, so it can be deployed and new dumps could be generated before we do the maintenance.

Change 277927 had a related patch set uploaded (by Smalyshev):
Switch coordinate order in WKT

https://gerrit.wikimedia.org/r/277927

Change 279276 had a related patch set uploaded (by Smalyshev):
Prepare for coordinate order switch.

https://gerrit.wikimedia.org/r/279276

Change 279276 merged by jenkins-bot:
Prepare for coordinate order switch.

https://gerrit.wikimedia.org/r/279276

Smalyshev renamed this task from Coordinate order in RDF may be wrong to Coordinate order in RDF is wrong.Apr 2 2016, 7:29 AM
Smalyshev triaged this task as High priority.

The standard says: "If the optional AXIS terms are not present, then the default values are assumed". So is it possible to add these "optional axis terms" somehow? Can we do this instead of switching the numbers with no indication which is which? As @daniel said, this feels like a bad, confusing idea.

@thiemowmde I'm not sure what you want to add, could you provide an example?

As @daniel said, this feels like a bad, confusing idea.

I am not sure I understand what exactly is bad and confusing? Established practice is to make WKT literals certain way. Everybody doing it this way. We are doing it wrong way, because I neglected to check it when implementing the format (my bad, my fault, I should not have assumed it goes certain way without actually checking). This needs to be fixed. What is confusing about it? Imagine the literals already were in the right order. Would that be confusing? If so, why? Would you request changing it to current order?

@thiemowmde I agree that the situation sucks, but the sooner we fix this, the better. We are bumping the format version included in the dumps, so clients can check for the version and change processing accordingly, if they need to handle both the old and the new serialization. We have announced the change on the Wikidata list, with no reaction so far.

@Smalyshev Regarding the format version bump: We should have a change log for this. Wikibase currently doesn't have release notes (since we don't do releases). But we should at least have a basic log for this kind of change. How about adding docs/rdf-binding, with basic change notes, and (for now) links to the documentation we have online? Eventually, it should be the other way around - the online docs should come from the documentation in the code repo.

@daniel excellent idea about the change log. I'll add basic log and a comment in RdfVocabulary.php about it.

Let's get this done. Stas has already done the announcement. Together with the change log we're fine.

@Smalyshev what about Thiemo's suggestion to add the "optional axis terms" the spec speaks of? I have not actually read the spec in detail - would it be feasible to add something to the literal that would make it clear what the order is?

@daniel I'm not sure how to do this, I suspect we'd have to add coordinate system definition to every literal then and it'd not only more than triple their size and make them barely readable but require more complex parser. And I don't think I know how to do this properly without messing it up further. If somebody does I'd be happy to learn, but I'm not sure we should open this particular can of worms.

Given that the long-lat order is the default, it is specifically spelled out in the and we intend to use it just like the spec says we should, I don't think we need to complicate the matters further.

@Smalyshev ok thanks. just making sure the concern has been considered.

Change 277927 merged by jenkins-bot:
Switch coordinate order in WKT

https://gerrit.wikimedia.org/r/277927

Smalyshev claimed this task.

Change 286373 had a related patch set uploaded (by Jonas Kress (WMDE)):
Fix / swap order of latitude and longitude in map

https://gerrit.wikimedia.org/r/286373

Change 286373 merged by jenkins-bot:
Fix / swap order of latitude and longitude in map

https://gerrit.wikimedia.org/r/286373