As a user, when the Wikidata updates fail (on multiple tries, etc), I want the data to eventually happen somehow.
As a maintainer of WDQS I want the streaming updater to be able to reconcile a wikibase item so that I can fix some inconsistencies without reloading the full database.
This can be achieved by introducing a new topic the streaming updater would consume and would contain a message indicating if an item needs to be reconciled or deleted given a specific revision.
This can be used to reconcile missed events (MW bugs, missing events, late events) or failures when fetching the item data.
When a deletion is required existing code will be used.
When the item to reconcile exists the mutation message will contain all the entity data and the consumer will perform a full reconciliation.
Automatic reconciliation (probably via a batch running from the analytics cluster) should be possible reading side-outputs:
Ad-hoc reconciliation should be possible via a script (or possibly from wikibase itself if this is deemed necessary).
The schema of this new topic should be as follow:
- meta: typical event metadata
- item: string the wikibase item to update
- revision: long the revision to work with
- type: enum: create or delete
The flink operator determining the mutation to apply should be changed to support new conditions:
- if the revision in the message is older than the one seen in the state then an operation corresponding to the state is emitted:
- reconcile if the state is CREATED using the revision seen and fetch the data from this revision
- delete if the state is DELETED
- if the revision in the message is newer than the one seen in the state (or never seen) then an operation corresponding to the message is emitted:
- reconcile if the message has a type create using the revision from the message
- delete if the message has a type delete
AC:
- a new type of operation reconcile is added to MutationEventData
- streaming-updater-producer operators are adapted to support this new message
- a new schema is added to https://schema.wikimedia.org/repositories/secondary/jsonschema/rdf_streaming_updater
- the streaming-updater-consumer supports the reconcile operation