As a user, when the Wikidata updates fail (on multiple tries, etc), I want the data to eventually happen somehow.
As a maintainer of WDQS I want the streaming updater to be able to reconcile a wikibase item so that I can fix some inconsistencies without reloading the full database.
This can be achieved by introducing a new topic the streaming updater would consume and would contain a message indicating if an item needs to be reconciled or deleted given a specific revision.
This can be used to reconcile missed events (MW bugs, missing events, late events) or failures when fetching the item data.
When a deletion is required existing code will be used.
When the item to reconcile exists the mutation message will contain all the entity data and the consumer will perform a full reconciliation.
Automatic reconciliation (probably via a batch running from the analytics cluster) should be possible reading side-outputs:
- [[https://schema.wikimedia.org/repositories//secondary/jsonschema/rdf_streaming_updater/lapsed_action/latest.yaml|late events]]
- [[https://schema.wikimedia.org/repositories/secondary/jsonschema/rdf_streaming_updater/fetch_failure/latest.yaml|failed events]]
Ad-hoc reconciliation should be possible via a script (or possibly from wikibase itself if this is deemed necessary).
The schema of this new topic should be as follow:
* meta: typical event metadata
* item: string the wikibase item to update
* revision: long the revision to work with
* type: enum: create or delete
The flink operator determining the mutation to apply should be changed to support new conditions:
- if the revision in the message is older than the one seen in the state then an operation corresponding to the state is emitted:
-- `reconcile` if the state is `CREATED` using the revision seen and fetch the data from this revision
-- `delete` if the state is `DELETED`
- if the revision in the message is newer than the one seen in the state (or never seen) then an operation corresponding to the message is emitted:
-- `reconcile` if the message has a type `create` using the revision from the message
-- `delete` if the message has a type `delete`
AC:
- a new type of operation `reconcile` is added to MutationEventData
- streaming-updater-producer operators are adapted to support this new message
- a new schema is added to https://schema.wikimedia.org/repositories/secondary/jsonschema/rdf_streaming_updater
- the streaming-updater-consumer supports the `reconcile` operation