Page MenuHomePhabricator

query.wikidata.org returning different results when re-running a query, though the data is unchanged
Closed, ResolvedPublic

Description

On repeatedly re-running this SPARQL query on query.wikidata.org, I'm seeing inconsistent results between runs which aren't due to changes to the items being returned:

http://tinyurl.com/ydg868kc

For example, sometimes, when trying over the last hour:

  • It returns 1445 results, including the person wd:Q21747014
  • It returns 1441 results, not including the person wd:Q21747014

This is rather worrying, since it seems like we can't trust the results from the query service at the moment, and it creates the risk that a tool we're developing won't realise that certain statements already in Wikidata and adds a duplicate.

Is this a known problem, and is there any workaround?

Many thanks for any advice...

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev subscribed.

Probably de-sync between the servers, I'll check it.

I re-run the query and first time it shows 1531, the second time 1534. I ran it couple of times after that and the result didn't change. Even when I changed the browser. I guess, yes, its a de-sync problem and its not fixed yet.

I've updated relevant entities and looks like the query is consistent now. Please tell me if it happens again.

Alexsdutton subscribed.

Hi there. We're seeing similar behaviour again. If you run http://tinyurl.com/yboegzh7 against wdqs1005 (going by the X-Served-By header) you get 42 results, whereas if you run it against wdqs1003 you only get 40.

One of the people missing is https://www.wikidata.org/wiki/Q4773275, who should be returned by that query, owing to his P39 to "Member of Toronto City Council".

I think there are probably more such discrepancies, but this is the one that's leapt out at me so far, and I don't know of a way to reliably target a particular node to check more systematically.

@Alexsdutton I think this is fixed now. I'd suggest opening a new task next time some discrepancy is found (maybe as a child of this one), otherwise it's very hard to track it with so many opens and closes for the same task.