Page MenuHomePhabricator

Suspicious data loss on the Query Service
Closed, ResolvedPublic

Description

Several issues have been noticed by users over the last days:

  • Number of Lexemes displayed by the Query Service regularly changes (source? link?)
  • Some Lexemes don't appear in the Query Service anymore (source? link?)
  • number of triples suddently dropped (link)

D_6upTmWkAAR-SO.jpeg (540×1 px, 41 KB)

Event Timeline

Lea_Lacroix_WMDE renamed this task from Suspicious data lost on the Query Service to Suspicious data loss on the Query Service.Jul 20 2019, 4:18 PM

For example, if I run this query

It gives 34 rows (in reality should be >300), with Swedish at 2018 (should be over 7000) and English at 89 (should be over 10 000).

Yesterday I was able to get correct results by adding some random comments in the query and/or re-running it several times, but it doesn't work today anymore.

It started happening on July 18.

Worth noting that user:Twofivesixbot

https://www.wikidata.org/w/index.php?title=Special:Contributions/Twofivesixbot&offset=&limit=500&target=Twofivesixbot

has been doing a lot of merges of gene items of late, which may be associated with triple decreases over the past fortnight. Examples from around the time of the triple decrease illustrated above, sometimes involving items with ~800k of properties:

https://www.wikidata.org/w/index.php?title=Special:Contributions/Twofivesixbot&offset=20190718050705&limit=500&target=Twofivesixbot

to

https://www.wikidata.org/w/index.php?title=Special:Contributions/Twofivesixbot&dir=prev&offset=20190718061657&limit=500&target=Twofivesixbot

My fault, I've not loaded the full Lexeme dump. So missing Lexemes are because of that. I'll fix it on Monday. Also, please use Wikidata-Query-Service bug, otherwise I might miss the bug.

Smalyshev triaged this task as High priority.

We also seem to be missing lexemes dumps for last 2 weeks - the latest one is Jul 5h. @ArielGlenn is this something related to the dumps problems we had with JSON or a different issue?

The truthy nt ones for last week are still finishing up, and then the lexemes for last week will go. That was due to T228104 which was caused by https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/519304/ in wmf.13. That was deployed to group1 on July 11 (see .https://tools.wmflabs.org/sal/log/AWvjLDb1OwpQ-3Pk9nl7 ) which means the previous weeks' lexemes did not run either, as they go near the end of the week.

This could just be merged in as dup to T228104 imo.

Thanks @ArielGlenn the dump is now up, I'll load it shortly.

Should be fine now, please verify.

Yes, the queries are behaving as expected. Thank you!