Page MenuHomePhabricator

Some items on WDQS out of sync
Closed, ResolvedPublic

Description

As reported by Jasper Koehorst:

I was updating the biological databases in wikidata when I encountered cases where the official website was already added but did not show up on the SPARQL endpoint.

For example the following query:

http://alturl.com/vrckp

Mentions that:

wd:Q24174701 MetaboLights

Has no official website entry. However this entry was added by someone at 12:27, 20 May 2016.

(1,196 bytes) (+374)‎ . . (‎Created claim: official website (P856): http://www.ebi.ac.uk/metabolights/) (undo | thank)

As far as I can tell from previous experience the sync happens more or less spontaneously. However this is currently not the case. Is anyone aware of this issue?

Replication lag seems under control (around 10 seconds) so I expect that we have a few items missing replication and not a systematic issue.


Sample from Sept 2016: list (look for redlinks without any labels/descriptions/P31 values)

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript
thiemowmde added subscribers: Jonas, thiemowmde.

There is not enough information given to work on this. For example: How often does it happen? Only for specific queries? Does the issue go away after a while? How long does it take?

There is not enough information given to work on this. For example: How often does it happen? Only for specific queries? Does the issue go away after a while? How long does it take?

The most prominent issue is that items that are deleted a few months ago are showing up in queries. I'll link them here from now on, @hoo and @Smalyshev know about this.

"There is not enough information given to work on this." Conclusion: "triaged this task as "Low" priority."

Makes one wonder about triage. How do you assign priorities?

[off-topic / meta]
@Esc3300: If impossible to reproduce due to missing info (see T136393#2609312), that would be a valid use of low priority IMO.
However the task assignee is also free to reset priority... :)

Smalyshev raised the priority of this task from Low to High.Sep 12 2016, 10:46 PM

Other than a incident in spring, isn't t the explanation for this that replication of changes doesn't always follow revision id?

Here is a list of some 600 from yesterday (look for redlinks without any labels/P31 values)

@Esc3300 Hmm looks like something with deletions is still not right. I will investigate, thanks.

Looks like this one:

https://www.wikidata.org/wiki/Special:EntityData/Q19369930.ttl?nocache=1475267192776&flavor=dump

produces fatal error. That may be the reason why deletes are broken. Filed T147098.

The problem isn't only with deleted item but also with normal item. Ex. this https://www.wikidata.org/wiki/Q5011561 is a disambiguation item with sitelink but the query
SELECT ?item
{
?item wdt:P31 wd:Q4167410 .
MINUS { [] schema:about ?item } .
}
extract the item. Another example: https://www.wikidata.org/w/index.php?title=Q22146053&redirect=no is a redirect but the same query extract the item

Here is a sample from deleted https://www.wikidata.org/wiki/Property:P2890 and https://www.wikidata.org/wiki/Property:P2885 :

SELECT * 
{
    VALUES ?p { wd:P2890 wd:P2885 } 
  	?p ?s ?v
}

query: 31 results

Thanks. I noticed. The report became much shorter.

Would you remove all items that appear in the deletion log from WQS?

@Esc3300 all gone, looks like maybe just replication lag or cached query result.

@ValterVB those seem to be old deletes, so they need to be handled manually.

In general, if delete happened over a month ago, and wasn't handled then due to some bug, it's impossible to detect it now, since recent changes stream only goes back a month. So these need to be handled manually, or on the next DB reload.

There are a bunch of deleted properties which are still showing up: http://tinyurl.com/jchjd56

Something weird is happening with properties... Need to investigate.

@Esc3300, @Edgars2007 I see those descriptions are there, they are just filtered out by the query filter. If you remove the regexp filter, you see all of them.

Do you have any idea when the next DB reload is likely to be?

Q2651609 is another item which is still out of sync (this query still shows an old label)

DB reload has been completed, so old issues should be gone. If there are new ones, please submit new task.