When coordinates are exported into RDF, they are represented with many more digits than the precision allows. I.e., coordinate for https://www.wikidata.org/wiki/Q116746, with precision specified as "arcseconds", or 31m, are exported as Point(13.366666666667 41.766666666667) - 12 digits, or sub-millimeter precision. It should be exported as Point(13.3667 41.7667) instead.
|mediawiki/extensions/Wikibase : master||Format coordinates with limited precision|
I can see that this situation tends to be confusing and could need some improvement, especially UX-wise. But this is not an issue specific to the RDF export or the Wikidata-Query-Service. These numbers are just how the coordinates are stored internally. And I don't think we can or even should change anything about this. Most of the coordinates are submitted via the API. If the submitted coordinate just was 13.366666666667, why should we truncate that?
Mostly because most of these digits are not representing any real data, it's just junk produced by decimal representation with overly big precision and produced by various conversions and calculations. We're just dragging around those meaningless characters that do not have any use and do not represent any data (nobody really measured that coordinate with micron precision and got 13.366666666667, what happened most probably that it was measured in another system, then calculation involved 40.1/3 (probably when converting degrees and minutes to decimal) and the result came out as 13.366666666667. And then we convert back, we'd get 40.100000000001 - again, junk data in 11 last decimal places.
I can follow all your arguments. It's just that I think the effect of this (actually well defined) behavior on users is really, really negligible. Most users are never going to see coordinates as numbers anyway, but as dots or shapes on maps.
And even if, which user will think of sub-millimeters when they see a representation like 13.366666666667? Especially when the object is a city, or any larger shape. Most users don't even know what 1 degree is in meters or miles.
That said, I agree this could be improved, and even have an actual suggestion I want to implement some day, either in the RDF export or somewhere deeper in the Wikibase code base: Basically, cut off decimal places that do not have any effect on any of the output formats we support. This algorithm should consider all output formats, because when such an algorithm is applied we don't know which output format will be used.
But this idea requires coordinates to be stored as strings, which they are not. Basically, this requires a new datatype.
I have filed T232984.
Can we please revert this change, or at least replace it with a solution that actually takes into account the precision instead of blindly rounding everything down to 4 decimal places?
It is relatively easy for data consumers to round down the coordinate values if they need to (including within SPARQL code), but practically impossible to recover the missing data (better precision) without having to refer outside the WDQS once they have been arbitrarily rounded down.
As far as I understand the patch https://gerrit.wikimedia.org/r/521984, it truncates all coordinates to at most 4 decimal places, even without looking at the precision of the coordinate value. I'm very much concerned about this, as well as confused why this was done.
The frontend offers a few precisions to the user. The smallest is "1/10000 of an arcsecond", or 0.000000278 degrees. This requires at least 7 decimal places, better 8 to be sure.
The backend does not limit the precision of a coordinate value to anything. It can be as arbitrary as a user of the API wants. The frontend respects arbitrary precisions, displays them, and makes sure they don't get lost when such a value is edited.
@WMDE-leszek, may I ask why the patch was merged? Was this in a sprint or discussed anywhere else within the Wikidata team? Also pinging @Lydia_Pintscher because I believe this is now causing actual data loss (within the query service) since 8 weeks.
Personally, I think we should revert this change. I also don’t know why it was suddenly merged, but if we decide to do it, the implementation should take the specified precision of the coordinate value into account instead of hard-coding a certain number of digits, the change should be announced in advance, and we should take care that it applies to all coordinate values, not just those of whichever items are edited after the change is deployed. (In practice, that last bullet point would probably require a full reload – I doubt that updating //all// coordinate value nodes is feasible.)
Thanks for the additional analysis. Another thing I realized later is that this might cause actual data loss in the Wikidata database. This can happen when a tool uses data it got from the query service to edit the original Wikidata entity. I believe this scenario should be rare, but wanted to mention it.
https://gerrit.wikimedia.org/r/521984 has been reverted, see https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/540402
Reverting does not mean dismissing points brought up by @Smalyshev here and in T232984. Revert is only meant as a mitigation to issues reported by @seav in T232984. Issues raised by @Smalyshev hold, they would need a bit more appropriate solution than what we have tried in https://gerrit.wikimedia.org/r/521984.