Page MenuHomePhabricator

Some items on WDQS out of sync
Closed, ResolvedPublic

Description

As reported by Jasper Koehorst:

I was updating the biological databases in wikidata when I encountered cases where the official website was already added but did not show up on the SPARQL endpoint.
For example the following query:
http://alturl.com/vrckp
Mentions that:
wd:Q24174701 MetaboLights
Has no official website entry. However this entry was added by someone at 12:27, 20 May 2016.
(1,196 bytes) (+374)‎ . . (‎Created claim: official website (P856): http://www.ebi.ac.uk/metabolights/) (undo | thank)
As far as I can tell from previous experience the sync happens more or less spontaneously. However this is currently not the case. Is anyone aware of this issue?

Replication lag seems under control (around 10 seconds) so I expect that we have a few items missing replication and not a systematic issue.


Sample from Sept 2016: list (look for redlinks without any labels/descriptions/P31 values)

Event Timeline

Gehel created this task.May 27 2016, 8:36 AM
Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptMay 27 2016, 8:36 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

Maybe this one as well: https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#revert_not_on_WQS

On WQS, India's population now matches that of the world.

thiemowmde triaged this task as Low priority.Sep 5 2016, 3:31 PM
thiemowmde added subscribers: Jonas, thiemowmde.

There is not enough information given to work on this. For example: How often does it happen? Only for specific queries? Does the issue go away after a while? How long does it take?

There is not enough information given to work on this. For example: How often does it happen? Only for specific queries? Does the issue go away after a while? How long does it take?

The most prominent issue is that items that are deleted a few months ago are showing up in queries. I'll link them here from now on, @hoo and @Smalyshev know about this.

Smalyshev claimed this task.Sep 5 2016, 9:16 PM

"There is not enough information given to work on this." Conclusion: "triaged this task as "Low" priority."

Makes one wonder about triage. How do you assign priorities?

[off-topic / meta]
@Esc3300: If impossible to reproduce due to missing info (see T136393#2609312), that would be a valid use of low priority IMO.
However the task assignee is also free to reset priority... :)

Smalyshev raised the priority of this task from Low to High.Sep 12 2016, 10:46 PM
Esc3300 added a comment.EditedSep 13 2016, 3:10 PM

Other than a incident in spring, isn't t the explanation for this that replication of changes doesn't always follow revision id?

Here is a list of some 600 from yesterday (look for redlinks without any labels/P31 values)

Esc3300 updated the task description. (Show Details)Sep 30 2016, 9:29 AM

@Esc3300 Hmm looks like something with deletions is still not right. I will investigate, thanks.

Smalyshev added a comment.EditedSep 30 2016, 8:27 PM

Looks like this one:

https://www.wikidata.org/wiki/Special:EntityData/Q19369930.ttl?nocache=1475267192776&flavor=dump

produces fatal error. That may be the reason why deletes are broken. Filed T147098.

The problem isn't only with deleted item but also with normal item. Ex. this https://www.wikidata.org/wiki/Q5011561 is a disambiguation item with sitelink but the query
SELECT ?item
{
?item wdt:P31 wd:Q4167410 .
MINUS { [] schema:about ?item } .
}
extract the item. Another example: https://www.wikidata.org/w/index.php?title=Q22146053&redirect=no is a redirect but the same query extract the item

Esc3300 added a comment.EditedOct 1 2016, 11:18 AM

Here is a sample from deleted https://www.wikidata.org/wiki/Property:P2890 and https://www.wikidata.org/wiki/Property:P2885 :

SELECT * 
{
    VALUES ?p { wd:P2890 wd:P2885 } 
  	?p ?s ?v
}

query: 31 results

Deletes should be fine now.

Thanks. I noticed. The report became much shorter.

Would you remove all items that appear in the deletion log from WQS?

There are some from yesterday here

@Esc3300 all gone, looks like maybe just replication lag or cached query result.

@ValterVB those seem to be old deletes, so they need to be handled manually.

In general, if delete happened over a month ago, and wasn't handled then due to some bug, it's impossible to detect it now, since recent changes stream only goes back a month. So these need to be handled manually, or on the next DB reload.

Nikki added a subscriber: Nikki.Oct 10 2016, 5:20 PM

There are a bunch of deleted properties which are still showing up: http://tinyurl.com/jchjd56

Something weird is happening with properties... Need to investigate.

@Esc3300, @Edgars2007 I see those descriptions are there, they are just filtered out by the query filter. If you remove the regexp filter, you see all of them.

Oh... Thanks guys!

Nikki added a comment.Oct 23 2016, 4:30 PM

Do you have any idea when the next DB reload is likely to be?

Q2651609 is another item which is still out of sync (this query still shows an old label)

Smalyshev closed this task as Resolved.Feb 15 2017, 9:49 PM

DB reload has been completed, so old issues should be gone. If there are new ones, please submit new task.