As a maintainer of WDQS I want the streaming updater to be able to reconcile a wikibase item so that I can fix some inconsistencies without reloading the full database.
This can be achieved by introducing a new topic the streaming updater would consume and would contain two type of messages:
- reconcile a specific item revision
- reconcile a deleted item
This can be used to reconcile missed events (MW bugs, missing events, late events), the third mode will be used on fetch failures.
When a delete is required existing code will be used.
When the item is existing the mutation message will contain all the entity data and the consumer will work like the old updater and will perform a full reconciliation.
Automatic reconciliation (probably via a batch running from the analytics cluster) should be possible reading side-outputs:
- [[https://schema.wikimedia.org/repositories//secondary/jsonschema/rdf_streaming_updater/lapsed_action/latest.yaml|late events]]
- [[https://schema.wikimedia.org/repositories/secondary/jsonschema/rdf_streaming_updater/fetch_failure/latest.yaml|failed events]]
Ad-hoc reconciliation should be possible via a script (or possibly from wikibase itself if this is deemed necessary).
The schema of this new topic should be as follow:
* meta: typical event metadata
* item: string the wikibase item to update
* revision: long the revision with
* type: enum: create or delete
The decide mutation operation should be changed to support a new operation:
- if the revision in the message is older than the one seen in the state then an operation corresponding to the state is emitted:
-- `reconcile` if the state is `CREATED` using the revision seen and fetch the data from this revision
-- `delete` if the state is `DELETED`
- if the revision in the message is newer than the one seen in the state (or never seen) then an operation corresponding to the message is emitted:
-- `reconcile` if the message has a type `create` using the revision from the message
-- `delete` if the message has a type `delete`
AC:
- a new type of operation `reconcile` is added to MutationEventData
- streaming-updater-producer operators are adapted to support this new message
- a new schema is added to https://schema.wikimedia.org/repositories/secondary/jsonschema/rdf_streaming_updater
- the streaming-updater-consumer supports the `reconcile` operation