Page MenuHomePhabricator

Number of Senses is decreasing on ListeriaBot's report
Closed, ResolvedPublic


user:KaMan noticed that the number of Senses in English, reported by ListeriaBot on this page, are sometimes decreasing, even though users create some new every day. See tracking here. This doesn't match with the number of deleted Lexemes.

This is caused by de-synchronization of data between servers.

Underlying query:

SELECT ?item (count(?sense) as ?count) WHERE {
   ?l a ontolex:LexicalEntry ; dct:language ?item ; ontolex:sense ?sense .
} group by ?item order by desc(?count) ?item

Event Timeline

Restricted Application added subscribers: Cyberpower678, Aklapper. · View Herald TranscriptNov 27 2018, 1:52 PM
Lea_Lacroix_WMDE renamed this task from Number of Lexemes is decreasing on ListeriaBot's report to Number of Senses is decreasing on ListeriaBot's report.Nov 27 2018, 2:49 PM
Lea_Lacroix_WMDE updated the task description. (Show Details)
KaMan added a subscriber: KaMan.Nov 27 2018, 4:29 PM

I wouldn't be surprised if it's a WDQS problem, this is definitely generated from an RDF query.

I ran a manual update and the total for English bumped up to 2819 - so it doesn't look as if we've actually lost lexeme senses, just that some of the query servers don't know about all of them?

@Smalyshev I'd forgotten there was a phabricator ticket for this - anyway, this is what I was referring to... Last night's update bumped the number down again to 2718; however when I run the query directly on WDQS I get 3004 right now. Something's not right!

We've had issues with deletes not being reported (T210451) but this should be fixed now. But the new ones should not be missing. I'll take a look into what's going on (probably will take a couple of days as I'm on WikiCite now).

Just a note - WDQS query gives different results hopping up and down - sometimes 3004 (for English lexeme senses) and sometimes 2872, over about the last 10 minutes.

Looks like some data affected by T207673 wasn't properly updated. I'll try to re-update them.

Hmm looks like newer items are affected too, so probably it's an instance of T210044: Data corruption when loading RDF data into WDQS.

Smalyshev moved this task from Backlog to Doing on the User-Smalyshev board.Dec 4 2018, 6:19 PM

Is it still the issue?

Smalyshev added a comment.EditedDec 25 2018, 9:16 PM

I think there's something weird is going on with Unicode encoding... The database reports two triples for IPA for "/su\u00cb\u0090r/" and "/su\u02d0r/". I am not sure how that happened but this is probably the source of the issue. Not sure why the update process does not remove the extra one.

Though, this may not be related, since 22 seems to be a correct result for the query. I am not sure why it was 21 before...

OK the unicode thing is fixed now, so I wonder if there are any other issues there.

@KaMan Can you try again and let us know if you still encounter issues? :)

@Lea_Lacroix_WMDE @Smalyshev I don't see any problems now. Thanks.

Lea_Lacroix_WMDE closed this task as Resolved.Jan 21 2019, 10:54 AM
Lea_Lacroix_WMDE claimed this task.

Great! I close the task for now, but feel free to reopen it if the same issue happen again.