Page MenuHomePhabricator

Extra xsd:decimal triples for some(?) globe coordinate values
Closed, ResolvedPublic


@Nikki noticed these extra triples for some globe coordinate values, e. g. the coordinate location of Kiel:

wdv:da11113b27c1aed60f99f8a18c4631b1 a wikibase:GlobecoordinateValue ;
        wikibase:geoLatitude "54.325277777778"^^xsd:decimal ;
        wikibase:geoLatitude "54.325277777778"^^xsd:double ;
        wikibase:geoLongitude "10.140555555556"^^xsd:decimal ;
        wikibase:geoLongitude "10.140555555556"^^xsd:double ;
        wikibase:geoPrecision "0.00027777777777778"^^xsd:decimal ;
        wikibase:geoPrecision "2.7777777777778E-4"^^xsd:double ;
        wikibase:geoGlobe wd:Q2 .

The data type for these three values was changed from xsd:decimal to xsd:double in I34983dffe9 / T179228, but apparently the old triples were never dropped.

I also notice that the xsd:decimal triple does not use scientific notation (which would only be legal for xsd:double), so I’m not sure what’s going on here anyways, given the context of T179228… does that mean that we weren’t actually producing illegal values, and some step in the pipeline transformed the illegal value into the legal version, and therefore the change to double wasn’t even necessary? Or are the xsd:decimal triples coming from somewhere else? I’m confused, sorry.

Event Timeline

Restricted Application added a project: Discovery. · View Herald TranscriptJun 25 2018, 11:18 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.Jun 25 2018, 6:36 PM

Ohh, this is to be expected, and I forgot to address this unfortunately. The problem is that the hash of the value didn't change, but the content of the node did, so the update didn't work properly, because it assumes that nodes with the same hash have the same data. We'd probably need to manually drop the bad triples (should not be hard).

A problem though would be that some values only exist in old format, with xsd:decimal only, and we probably should not delete those, at least without updating. That makes the queries a bit harder. Still possible, but will probably take a bit more time to fix.

Smalyshev triaged this task as Medium priority.Jun 25 2018, 8:36 PM

A full reload would also fix this, right? I thought we had at least one reload (for unrelated reasons) since last November anyways, but I must be remembering something else if those triples are still there.

I don't remember the exact time, but it looks like the only place the wrong (and old-format) triples could have come from... Generally, with our data volume now, and 12 servers, full reload is much larger proposition than before (and yes, we'd need to think if we can make it better, I have some ideas, but that's for another task). How bad is to have those extra triples? I can probably remove/convert them without full reload, I just need to know how to prioritize this.

I'm not aware of anyone else noticing yet, so it wouldn't appear to be urgent.

The main issue from my point of view (it's the reason I noticed at least) is that queries including things like wd:Q1707 p:P625/psv:P625 [ wikibase:geoLatitude ?lat ; wikibase:geoLongitude ?lon ] . unexpectedly and confusingly return more results than they should (4 rows per statement instead of 1, or 8 rows if you also select the precision) and it's not clear to me how to get the expected behaviour... distinct doesn't help because the data types are different and filtering using datatype() means some statements are missing since some items only have decimal and some only have double. It's particularly confusing when looking at the results in the query service UI because the data type isn't shown there.

Vvjjkkii renamed this task from Extra xsd:decimal triples for some(?) globe coordinate values to 5caaaaaaaa.Jul 1 2018, 1:02 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
JJMC89 renamed this task from 5caaaaaaaa to Extra xsd:decimal triples for some(?) globe coordinate values.Jul 1 2018, 4:11 AM
JJMC89 lowered the priority of this task from High to Medium.
JJMC89 updated the task description. (Show Details)
Smalyshev moved this task from Next to Doing on the User-Smalyshev board.Jul 25 2018, 12:32 AM
Smalyshev closed this task as Resolved.Jul 27 2018, 4:57 PM
Smalyshev claimed this task.

I've cleaned the database and it should not happen anymore. Please reopen if you find more instances.

@Smalyshev Please check out the single coords property at versus the two rows returned by this query, suggestive of 'hidden' triples lurking still.

SELECT ?item ?itemLabel ?stat ?lat ?long

VALUES ?item {wd:Q6522893}
?item p:P625 ?stat. ?stat psv:P625  [wikibase:geoLatitude ?lat ; wikibase:geoLongitude ?long ]  . 
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }


Per Laske, SPARQL for this issue in Germany []

SELECT ?item ?itemLabel ?stat ?lat ?long WITH {
SELECT ?item ?stat (COUNT(?lat)+COUNT(?long) AS ?count) WHERE {
VALUES ?countries { wd:Q183 }
?item wdt:P17 ?countries;

p:P625 ?stat. ?stat psv:P625  [wikibase:geoLatitude ?lat ; wikibase:geoLongitude ?long ]  .

GROUP BY ?item ?stat
} AS %i WHERE {

include %i
?stat psv:P625 [ wikibase:geoLatitude ?lat; wikibase:geoLongitude ?long ] .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

ORDER BY ?item

Tagishsimon reopened this task as Open.Nov 7 2019, 2:49 AM
Lea_Lacroix_WMDE removed Smalyshev as the assignee of this task.Nov 18 2019, 2:59 PM
Lea_Lacroix_WMDE added a subscriber: Gehel.
Gehel closed this task as Resolved.Jun 24 2020, 12:44 PM
Gehel claimed this task.

Full dataset has been reloaded, this should fix all similar issues. If you find other duplicated coordinates, please re-open!

Data has been reloaded, please reopen if you encounter the problem again.