WDQS returns current AND old data
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Sebotic
	Sep 12 2015, 7:15 PM

Description

Hello everyone!
I created a SPARQL query which should return all CHEMBL IDs (P592) from all values in significant drug interactions (P769) of item Q179996. It actually works and returns all values appropriately. But unfortunately, It also returns values my drug bot (https://www.wikidata.org/wiki/User:ProteinBoxBot/Drug_items) replaced about a month ago. So with the results returned, it is impossible to determine what the current and the old values are, even worse, it gives the impression to the user that both values are valid. This behaviour was also experienced by another user executing different queries.

Should this be a feature and not a bug (in order to allow queries on the revision history of items), I think it should be clearly stated in the documentation (and how to filter for only the current values). I could not find anything on that. Thank you!

Executed on:
https://query.wikidata.org/bigdata/namespace/wdq/sparql

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>

SELECT ?compound ?label ?chembl WHERE {
    ?compound wdt:P769 wd:Q179996 .
    ?compound wdt:P592 ?chembl
     OPTIONAL  {
        ?compound rdfs:label ?label filter (lang(?label) = "en") .
    }

}

Related Objects

Mentioned In: T207675: Some items are in an inconsistent state
T196399: WDQS returns incorrect data compared to what is in Wikidata
Mentioned Here: P592 (An Untitled Masterwork)
P769 Summary of TLS work

Event Timeline

Sebotic created this task.Sep 12 2015, 7:15 PM

Sebotic raised the priority of this task from to High.

Sebotic updated the task description. (Show Details)

Sebotic added a project: Wikidata-Query-Service.

Sebotic subscribed.

Restricted Application added projects: Wikidata, Discovery-ARCHIVED. · View Herald TranscriptSep 12 2015, 7:15 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Lydia_Pintscher moved this task from incoming to needs discussion or investigation on the Wikidata board.Sep 14 2015, 10:57 AM

Lydia_Pintscher added subscribers: Lydia_Pintscher, Smalyshev.

Smalyshev added a project: Discovery-Wikidata-Query-Service-Sprint.Sep 14 2015, 6:04 PM

Smalyshev set Security to None.

@Sebotic does it look fine now? If yes, it may be just a glitch with a skipped update, we recently had some networking glitches that may be related. Do you have any more examples of non-updated data?

Smalyshev moved this task from Backlog to Needs review on the Discovery-Wikidata-Query-Service-Sprint board.Sep 15 2015, 5:34 PM

Bene and I ran into an article today that was missing badges. An edit to the article made them show up in sparql.

@Lydia_Pintscher if edit fixes it, that is definitely missed update. Which article is that? These should be going away soon, but as I mentioned there were a couple of glitches on initial setup and we are still running on the same dataset. Once we're figured out remaining format issues, we'll reload the dataset which should eliminate those.

If edit does not fix it (within reasonable timefreame, checked against timestamp on query.wikidata.org homepage) then it may be a serious updater issue which needs deeper digging.

Should have been this one: https://www.wikidata.org/wiki/Q2260277

@Smalyshev I just tested the query once again. Some of the old data is gone now, but one still comes up. It is this item: 'http://www.wikidata.org/entity/Q402633 I currently do not have other queries to execute, but I will think of some.

Smalyshev moved this task from Needs triage to WDQS on the Discovery-ARCHIVED board.Sep 24 2015, 5:05 PM

• ksmith moved this task from WDQS to On Sprint Board on the Discovery-ARCHIVED board.Sep 24 2015, 5:12 PM

We believe this is fixed now. Please reopen if the problem persists.

• Deskana moved this task from Needs review to Done on the Discovery-Wikidata-Query-Service-Sprint board.Oct 8 2015, 5:04 PM

I believe this issue is still present.

As an example:
https://www.wikidata.org/wiki/Q24788592

Which doesn't show up using the sparql endpoint: http://tinyurl.com/hcye8s3

There are 3 with the same issue:
IPR005128 https://www.wikidata.org/wiki/Q24769972
IPR015233 https://www.wikidata.org/wiki/Q24788592
IPR029830 https://www.wikidata.org/wiki/Q24770987

The IPR005128 one seems to work occasionally, however meaning it may have loaded correctly on one server but not on another?
Please take a look. Thank you

Gstupp reopened this task as Open.Jul 14 2016, 6:52 PM

I have a quick follow up for this. I made 2 slightly differing sparql queries one accessing values directly and one inderectly. They should give the same return values, but it seems that if each query is executed on a different server, the 2 result sets differ, one gives back 54320 values, the other 54315. Irrespective of the counts, some values differ. Seem my code here: https://gist.github.com/sebotic/a92f9291175f4968ce265ffe31e0e9c2

Output:

r1 54320
r2 54315
r1 to r2 diff {'http://www.wikidata.org/entity/Q1649375', 'http://www.wikidata.org/entity/Q21107532', 'http://www.wikidata.org/entity/Q21126253', 'http://www.wikidata.org/entity/Q23427423', 'http://www.wikidata.org/entity/Q23502720', 'http://www.wikidata.org/entity/Q22291919', 'http://www.wikidata.org/entity/Q21149708', 'http://www.wikidata.org/entity/Q5401857', 'http://www.wikidata.org/entity/Q21109520', 'http://www.wikidata.org/entity/Q21154422', 'http://www.wikidata.org/entity/Q21988981', 'http://www.wikidata.org/entity/Q21170314', 'http://www.wikidata.org/entity/Q21105176', 'http://www.wikidata.org/entity/Q21101781', 'http://www.wikidata.org/entity/Q21758789', 'http://www.wikidata.org/entity/Q21110288', 'http://www.wikidata.org/entity/Q2041084', 'http://www.wikidata.org/entity/Q23597325', 'http://www.wikidata.org/entity/Q21111549', 'http://www.wikidata.org/entity/Q23607725', 'http://www.wikidata.org/entity/Q21987727', 'http://www.wikidata.org/entity/Q23565314', 'http://www.wikidata.org/entity/Q21106974', 'http://www.wikidata.org/entity/Q21109066', 'http://www.wikidata.org/entity/Q23547921', 'http://www.wikidata.org/entity/Q21112289', 'http://www.wikidata.org/entity/Q4044986', 'http://www.wikidata.org/entity/Q23625489', 'http://www.wikidata.org/entity/Q21097467', 'http://www.wikidata.org/entity/Q419999', 'http://www.wikidata.org/entity/Q21151165', 'http://www.wikidata.org/entity/Q21765867', 'http://www.wikidata.org/entity/Q21132784', 'http://www.wikidata.org/entity/Q7119385', 'http://www.wikidata.org/entity/Q14914349', 'http://www.wikidata.org/entity/Q24136789', 'http://www.wikidata.org/entity/Q21106149', 'http://www.wikidata.org/entity/Q287896', 'http://www.wikidata.org/entity/Q21149578', 'http://www.wikidata.org/entity/Q21112096', 'http://www.wikidata.org/entity/Q737488', 'http://www.wikidata.org/entity/Q22291581'}
r2 to r1 diff {'http://www.wikidata.org/entity/Q21109932', 'http://www.wikidata.org/entity/Q23532011', 'http://www.wikidata.org/entity/Q24092707', 'http://www.wikidata.org/entity/Q21113917', 'http://www.wikidata.org/entity/Q23495416', 'http://www.wikidata.org/entity/Q21114054', 'http://www.wikidata.org/entity/Q21136466', 'http://www.wikidata.org/entity/Q21112164', 'http://www.wikidata.org/entity/Q21125854', 'http://www.wikidata.org/entity/Q909409', 'http://www.wikidata.org/entity/Q21118862', 'http://www.wikidata.org/entity/Q21133039', 'http://www.wikidata.org/entity/Q21121539', 'http://www.wikidata.org/entity/Q23571649', 'http://www.wikidata.org/entity/Q21172303', 'http://www.wikidata.org/entity/Q23456743', 'http://www.wikidata.org/entity/Q22293399', 'http://www.wikidata.org/entity/Q418404', 'http://www.wikidata.org/entity/Q21108415', 'http://www.wikidata.org/entity/Q21105173', 'http://www.wikidata.org/entity/Q21122885', 'http://www.wikidata.org/entity/Q21141185', 'http://www.wikidata.org/entity/Q4897285', 'http://www.wikidata.org/entity/Q23633201', 'http://www.wikidata.org/entity/Q23598795', 'http://www.wikidata.org/entity/Q21130454', 'http://www.wikidata.org/entity/Q22291022', 'http://www.wikidata.org/entity/Q21117998', 'http://www.wikidata.org/entity/Q21121760', 'http://www.wikidata.org/entity/Q23597984', 'http://www.wikidata.org/entity/Q21139471', 'http://www.wikidata.org/entity/Q21109478', 'http://www.wikidata.org/entity/Q24268653', 'http://www.wikidata.org/entity/Q22678298', 'http://www.wikidata.org/entity/Q23460137', 'http://www.wikidata.org/entity/Q4046258', 'http://www.wikidata.org/entity/Q21105140'}

as far as I could see, the reason why these result sets differ is not the rank or other differences. If the queries are both run on the same server, there is no difference between the result sets. (this is an assumption as I do not know which server is really executing a query)

Thanks for looking at this issue!

Best,
Sebastian

You can know which server runs the query if you look at the network trace (ie. in Chrome Devtools) and see x-served-by header, e.g. x-served-by:wdqs1002.

Smalyshev moved this task from Done to In progress on the Discovery-Wikidata-Query-Service-Sprint board.Jul 14 2016, 10:25 PM

Smalyshev claimed this task.Jul 14 2016, 10:27 PM

thanks, here are the headers for r1 and r2, respectively:

{'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Content-Type': 'application/sparql-results+json', 'Via': '1.1 varnish-v4, 1.1 varnish-v4, 1.1 varnish-v4, 1.1 varnish-v4', 'X-Cache': 'cp1058 miss, cp2006 miss, cp4002 miss, cp4001 miss', 'Accept-Ranges': 'bytes', 'X-Served-By': 'wdqs1001', 'X-Client-IP': '137.131.58.191', 'Date': 'Thu, 14 Jul 2016 22:39:31 GMT', 'Vary': 'Accept, Accept-Encoding', 'Server': 'nginx/1.11.1', 'Set-Cookie': 'WMF-Last-Access=14-Jul-2016;Path=/;HttpOnly;secure;Expires=Mon, 15 Aug 2016 12:00:00 GMT', 'Cache-Control': 'public, max-age=300', 'X-Varnish': '63828730, 50083124, 32483470, 1448128', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Age': '2', 'Content-Encoding': 'gzip', 'X-Analytics': 'https=1;nocookies=1', 'Content-Length': '232492'}
{'Connection': 'keep-alive', 'Access-Control-Allow-Origin': '*', 'Content-Type': 'application/sparql-results+json', 'Via': '1.1 varnish-v4, 1.1 varnish-v4, 1.1 varnish-v4, 1.1 varnish-v4', 'X-Cache': 'cp1045 miss, cp2025 miss, cp4004 miss, cp4001 miss', 'Accept-Ranges': 'bytes', 'X-Served-By': 'wdqs1002', 'X-Client-IP': '137.131.58.191', 'Date': 'Thu, 14 Jul 2016 22:39:35 GMT', 'Vary': 'Accept, Accept-Encoding', 'Server': 'nginx/1.11.1', 'Set-Cookie': 'WMF-Last-Access=14-Jul-2016;Path=/;HttpOnly;secure;Expires=Mon, 15 Aug 2016 12:00:00 GMT', 'Cache-Control': 'public, max-age=300', 'X-Varnish': '64294654, 50298994, 32690474, 1448131', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Age': '3', 'Content-Encoding': 'gzip', 'X-Analytics': 'https=1;nocookies=1', 'Content-Length': '233261'}

So my assumption that the 2 servers give different results back for these queries seems correct.

The three items above are missing from wdq2, but I don't see anything anomalous in the logs around the time they were supposed to be created. All three are created by bot on 24 June 2016‎, but many others created by the same bot in the same timeframe do not have problems. Also, the query does provide evidence for lost updates, but I don't see anything anomalous in the logs. Looks like more logging is needed.

I think I found the culprit. If you look at the first entry at https://www.wikidata.org/w/api.php?format=json&action=query&list=recentchanges&rcdir=newer&rcprop=title|ids|timestamp&rcnamespace=0|120&rclimit=100&rccontinue=20160720152523|372669870, the first one has timestamp after the second one. So while rcid order is right, timestamp order is not. Which means if retrieved in timestamp order, it will retrieve the second one, then the first one gets added but the marker has already moved past it...

Smalyshev moved this task from In progress to Done on the Discovery-Wikidata-Query-Service-Sprint board.Aug 1 2016, 9:02 PM

I've deployed the fix so it should not happen anymore. If you see any skipped updates dated August 2 or later, please reopen with specific examples.

Smalyshev removed a project: Discovery-Wikidata-Query-Service-Sprint.Jul 14 2017, 10:32 PM

I think there is still some issue present.
I run the following query:

select * where {
  wd:Q52839992 p:P5114 ?s . 
  ?s ?a ?b .
}

From wdqs2003 I get no results, but from wdqs2002 I get the expected results. Q52839992 was last updated ~48 hours ago.
see also: https://github.com/SuLab/WikidataIntegrator/issues/65

Gstupp reopened this task as Open.May 10 2018, 5:25 PM

Smalyshev added a project: User-Smalyshev.May 10 2018, 6:14 PM

Floatingpurr subscribed.May 15 2018, 10:14 PM

Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.May 18 2018, 4:24 PM

Looks like now the servers are in sync. Please open new tasks if it happens again, otherwise it's a bit hard to keep track of what exactly is wrong. If it proves out to be same cause, I'll merge them.

Gstupp mentioned this in T196399: WDQS returns incorrect data compared to what is in Wikidata.Jun 4 2018, 7:01 PM

Floatingpurr mentioned this in T207675: Some items are in an inconsistent state.Oct 22 2018, 5:40 PM

WDQS returns current AND old dataClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

WDQS returns current AND old data
Closed, ResolvedPublic
Actions