Page MenuHomePhabricator

Provide public "reload entity to WDQS" API
Open, LowPublic

Description

Wikibase edits are usually automatically dispatched to WDQS, but for some reason the system occasionally misses a few onwiki changes. As a Wikidata user it is difficult to solve this problem, as we cannot do "null-edits" to Wikidata items.

I suggest to provide a public (MediaWiki?) API that takes entities as input ("Qxxx", "Pyyy", or "Lzzz"), and then pipes them to the internal "load entity to WDQS" script or function that is used to dispatch edits to WDQS anyways.

This should also work for deleted entities which are sometimes not being deleted on WDQS.

Event Timeline

There's already a shell script to do this: https://www.mediawiki.org/wiki/Wikidata_query_service/Implementation#Updating_specific_ID but is must be used will shell access and must be run in each of 14 query servers.

Note reload an deleted/nonexistent entity should also be allowed which deletes the data about the entity in WDQS: see T105427: Need a way for WDQS updater to become aware of suppressed deletes and https://www.wikidata.org/wiki/Wikidata:Oversight#Notes.

There is also https://wikitech.wikimedia.org/wiki/Wikidata_query_service#Manually_updating_entities with a description about that shell script.

I used to ask Stas every couple of months in the past, in order to use that shell script for a couple of hundred items each time. IMO it does not make sense to waste technician time for such requests, and I would be willing to trigger the reloading by myself if I could.

I used to ask Stas every couple of months in the past, in order to use that shell script for a couple of hundred items each time. IMO it does not make sense to waste technician time for such requests, and I would be willing to trigger the reloading by myself if I could.

+1, an api would be better if this is a common requirement.
But as said above right now this is something that needs to happen on every single query service server.

Personally I think we should re think our updater methods and consolidate updating in a single service, pushing out to many others, or with a pipeline at the other end.

Also having said that, if this ticket is a direct need of T105427 then perhaps T105427 just needs fixing in another way?

But as said above right now this is something that needs to happen on every single query service server.

If I understand correctly, the mentioned wikitechwiki page describes how this can be automated for all servers with one command. No idea how robust this is.

On the other hand, as "reload entity to WDQS" is basically done after each Wikidata edit, it might be worth to have a look how it is being done for regular edits. Those need to be distributed to all servers as well.

Actually these WDQS servers are merely reading RCStream. If this should be fixed in the same way we should add blank entries to RCStream (like https://gerrit.wikimedia.org/r/#/c/226931/).

By the way, reloading an entity is expensive for large entities (several MB) which may cause query service lags and if we have this feature is must be rate limited in some ways.

Actually these WDQS servers are merely reading RCStream

Or rather, Kafka topics that feed into RCStream. Adding a message there would probably be the easiest way to trigger the update. Also, note that Updater reads multiple topics, so you can make separate Kafka topic just for "manual" updates and have Updater read it, instead of polluting the main Kafka topic.

If there're multiple topics, can this dedicated topic be priortized? this type of force update should have higher priorty than ordinary messages.

Gehel triaged this task as Low priority.Sep 15 2020, 7:52 AM