Page MenuHomePhabricator

Wikidata Query Service is swapping items and properties
Closed, ResolvedPublicBUG REPORT

Description

The Wikidata Query Service seems to have swapped the triples of items and properties, somehow.

SELECT ?entity ?label WHERE {
  VALUES ?entity { wd:P31 wd:Q31 }
  ?entity rdfs:label ?label.
  FILTER(LANG(?label) = "en")
}
wd:P31Belgium
wd:Q31instance of

Only the wd: namespace appears to be affected, statements are still found under e. g. wdt:P31 and not wdt:Q31.


Original report:


Steps to Reproduce:

Go to, e.g., https://tools.wmflabs.org/scholia/author/Q18618629

Actual Results:

Some of the panels that include embedding report, e.g., "Unable to display result: Bar chart"

Expected Results:

The embedded visualization from WDQS should show.


An error is 'TypeError: "i is undefined"' from embed.wdqs.min.3f7096aef907206f2a35.js:1:17344

Furthermore, https://query.wikidata.org/css/embed.style.min.fa3ff6a142279256ede4.css gives 404

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Probably my fault, I've deployed new WDQS with some URL refactoring, but looks like something went wrong with URI scheme. I will be rolling it back.

Mentioned in SAL (#wikimedia-operations) [2019-08-15T23:35:44Z] <smalyshev@deploy1001> Finished deploy [wdqs/wdqs@b4da6e4]: Rollback blazegraph due to T230588 (duration: 09m 48s)

Looks like Belgium is back to Q31 and cats are cats again. I will investigate what happened to the URIs (most likely the order in the dictionary switched somehow because of changing of underlying storage class but I have missed it since the content of the dictionary is still the same).

Lucas_Werkmeister_WMDE lowered the priority of this task from High to Medium.Aug 15 2019, 11:50 PM

Scholia also looks fine again – lowering priority since only the investigation is missing. Thanks a lot for the quick response Stas!

Change 530463 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Fix prefixes order - it was P, then Q

https://gerrit.wikimedia.org/r/530463


Maybe it's just the cache, but I still see the problem, albeit on different identifiers than those affected yesterday.

Ghuron added a subscriber: Ghuron.Aug 16 2019, 9:23 AM

Still getting a few dozens of wdt:Pxxx, query:

SELECT * { ?obj wdt:P3083 [] FILTER(STRSTARTS(STR(?obj),"http://www.wikidata.org/entity/P")) }
Mmarx added a subscriber: Mmarx.Aug 16 2019, 12:09 PM

Another example, currently has 101 results:

SELECT ?item WHERE {
  ?item wdt:P495 [] .
  {{ FILTER NOT EXISTS { ?item wdt:P161 [] }} UNION { FILTER NOT EXISTS { ?item wdt:P364 [] }}}
  FILTER(STRSTARTS(STR(?item), "http://www.wikidata.org/entity/P"))
}

Change 530463 merged by jenkins-bot:
[wikidata/query/rdf@master] Fix prefixes order - it was P, then Q

https://gerrit.wikimedia.org/r/530463

I think all the items that were updated during the breakage, and were not updated since then, would be broken now – their data was imported using the swapped Ps and Qs, and then after the fix it’s swapped the wrong way. We should be able to get a list of all potentially affected items from recentchanges (Quarry) – @Smalyshev is it possible to tell the updater to re-import those specific entities?

I definitely can re-import mine, but existing wdt:Pxxx makes operating existing bots much more difficult than it should be

I think all the items that were updated during the breakage, and were not updated since then, would be broken now – their data was imported using the swapped Ps and Qs, and then after the fix it’s swapped the wrong way. We should be able to get a list of all potentially affected items from recentchanges (Quarry) – @Smalyshev is it possible to tell the updater to re-import those specific entities?

Can confirm. By deleting and immediately re-creating value pairs I see the error disappear.

Ghuron added a comment.EditedAug 18 2019, 3:12 PM

Mine wdt:P disappear

Smalyshev closed this task as Resolved.Aug 19 2019, 3:32 AM

Should be back to normal now.

Hmm looks like I updated all Qs but forgot about some Ps... These should be fine now.

Change 531985 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[wikidata/query/rdf@master] Fix initials order

https://gerrit.wikimedia.org/r/531985

Change 531985 merged by jenkins-bot:
[wikidata/query/rdf@master] Fix initials order

https://gerrit.wikimedia.org/r/531985

Lucas_Werkmeister_WMDE reopened this task as Open.EditedSep 5 2019, 12:59 PM

It looks like there are still some properties hidden in wikibase:quantityUnit values (e. g. P199 instead of Q199, which is the RDF version of "1"), as reported on project chat.

Edit: see T232212: QuantityValue quantityUnit contains both Q and P value in Wikidata Query Service - P value is wrong for a more detailed description of that issue (I’ve marked it as a duplicate of this one since I think it’s the same underlying cause).

agray added a subscriber: agray.EditedSep 7 2019, 12:55 PM

I'm still getting this with "normal" properties (ie not just wikibase:quantityUnit) on https://www.wikidata.org/wiki/Q334443 - it's showing up as a duplicate as 'P334443' in P39-based searches which return the item, and I can replicate this just with a targeted search by identifier (https://w.wiki/825).

I've tried editing the item, or deleting and recreating some claims, but it doesn't seem to affect the query results.

On the plus side, this is the only one I've found in a test run of ten thousand or so, so it seems to be a rare problem...

Lucas_Werkmeister_WMDE removed Smalyshev as the assignee of this task.Sep 9 2019, 10:03 AM
Lucas_Werkmeister_WMDE raised the priority of this task from Medium to High.
Lucas_Werkmeister_WMDE added a subscriber: Esc3300.

Also affecting some calendar models (reported by @Esc3300).

See T232212#5472285 / Wikitech for update instructions (needs to be done by someone with WDQS production access, which I don’t have as far as I’m aware).

See T232212#5472285 / Wikitech for update instructions (needs to be done by someone with WDQS production access, which I don’t have as far as I’m aware).

@Mathew.onipe : could you have a look into this? The easiest is probably to do a full data reload on wdqs1010 and copy it to each server.

If we’re going to do a full data reload, can we perhaps wait for T174504 to settle first? That would also be solved by a full reload, I believe, and might be very difficult to fully clean up without one.

(TL;DR: we’re now exporting coordinate values with reduced precision, but that doesn’t affect the hash of value nodes, so now we have some full-precision and some reduced-precision coordinate value nodes. Personally I think that change should be reverted, but even if we decide against that, the inconsistency should still be resolved.)

Gehel added a comment.Oct 1 2019, 8:37 AM

It seems unlikely that T174504 will be resolved soon-ish. Doing a full data reload isn't all that expansive, we can do another one once T174504 is fixed.

Ghuron removed a subscriber: Ghuron.Oct 1 2019, 11:48 AM

Change 540153 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/cookbooks@master] wdqs: add data-reload cookbook

https://gerrit.wikimedia.org/r/540153

It seems unlikely that T174504 will be resolved soon-ish.

We’ve reverted the problematic change now, so if you do the reload from this week’s RDF dumps (or later), that should resolve that issue for now. (Though that doesn’t mean the issue is fully resolved yet – it just solves the inconsistent data we currently have.)

Change 540153 merged by Gehel:
[operations/cookbooks@master] wdqs: add data-reload cookbook

https://gerrit.wikimedia.org/r/540153

Change 546194 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/cookbooks@master] wdqs: do not use external mirror

https://gerrit.wikimedia.org/r/546194

Change 546194 merged by Gehel:
[operations/cookbooks@master] wdqs: do not use external mirror

https://gerrit.wikimedia.org/r/546194

Change 547158 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[wikidata/query/deploy@master] Provide classpath via absolute path

https://gerrit.wikimedia.org/r/547158

Change 547510 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[wikidata/query/rdf@master] Provide classpath via absolute path to munge and runUpdate scripts

https://gerrit.wikimedia.org/r/547510

Change 547158 merged by Gehel:
[wikidata/query/deploy@master] Provide classpath via absolute path to munge and runUpdate scripts

https://gerrit.wikimedia.org/r/547158

Change 547510 merged by jenkins-bot:
[wikidata/query/rdf@master] Provide classpath via absolute path to munge and runUpdate scripts

https://gerrit.wikimedia.org/r/547510

Larske added a subscriber: Larske.Dec 23 2019, 1:04 AM
Zbyszko claimed this task.Feb 13 2020, 10:04 AM

Also wikibase:quantityUnit is affected by this - unless the strange results of this query is caused by something else:

SELECT ?item ?itemLabel ?amount ?unit ?unitLabel
WHERE
{
  VALUES ?item { wd:Q20414458 }
  ?item p:P2048/psv:P2048 _:heigth .
  _:heigth wikibase:quantityAmount ?amount .
  _:heigth wikibase:quantityUnit ?unit .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Results:

itemitemLabelamountunitunitLabel
wd:Q20414458Blomsterstykke46wd:P174728P174728
wd:Q20414458Blomsterstykke46wd:Q174728centimetre
Gehel added a comment.Apr 2 2020, 12:10 PM

Full reimport is in progress. By the look of it, it will still take around a week to catch up on lag on wdqs1010 before we can copy the data over to other servers.

By the looks of the lag it seems like the import has finished.
Running the query provided by @Dipsacus_fullonum a couple of times (with slight variations to "trick" the cache), the unit wd:P174728 is still present in the result of all runs.
Does that mean that wdqs1010 is not part of the pool for public queries, or that the invalid values made it into the dumps?

wdqs1010 is one of our test / admin / maintenance server, but not part of any public server pool. Now that the data is up to date on wdqs1010, we need to replicate it to all servers. We have a new team member starting soon, and that's going to be his first task. So this is not completed yet, but should be reasonably soon.

Thanks for the explanation and welcome to the new team member.

Nikki added a subscriber: Nikki.May 4 2020, 9:58 PM
Ghuron added a comment.May 7 2020, 6:12 PM

Just in case you need more queries to test this:

select * {
  ?s prov:wasDerivedFrom/pr:P248 wd:P51905050
} LIMIT 10
Ghuron added a subscriber: Ghuron.May 7 2020, 6:13 PM
Ghuron removed a subscriber: Ghuron.May 19 2020, 5:11 PM
agray added a comment.May 19 2020, 5:57 PM

Looks like the test queries above are all returning the "normal" expected results now - hopefully this means the reload has fixed things!

(@Mmarx's one with P495 returns some P-values, but they all seem legitimate - all six of those properties do indeed have a P495 value.)

dcausse closed this task as Resolved.Jun 17 2020, 9:43 AM
dcausse reassigned this task from Zbyszko to RKemper.
dcausse moved this task from Waiting to Needs Reporting on the Discovery-Search (Current work) board.
dcausse added subscribers: RKemper, Zbyszko, dcausse.

@agray indeed, @RKemper did finish the data reload a couple weeks ago but we forgot to get back to this ticket.