Page MenuHomePhabricator

Add a reconciliation strategy to the wdqs streaming updater
Open, HighPublic

Description

As a maintainer of WDQS I want the streaming updater to be able to reconcile a wikibase item so that I can fix some inconsistencies without reloading the full database.

This can be achieved by introducing a new topic the streaming updater would consume and would contain a message indicating if an item needs to be reconciled or deleted given a specific revision.

This can be used to reconcile missed events (MW bugs, missing events, late events) or failures when fetching the item data.

When a deletion is required existing code will be used.
When the item to reconcile exists the mutation message will contain all the entity data and the consumer will perform a full reconciliation.

Automatic reconciliation (probably via a batch running from the analytics cluster) should be possible reading side-outputs:

Ad-hoc reconciliation should be possible via a script (or possibly from wikibase itself if this is deemed necessary).

The schema of this new topic should be as follow:

  • meta: typical event metadata
  • item: string the wikibase item to update
  • revision: long the revision to work with
  • type: enum: create or delete

The flink operator determining the mutation to apply should be changed to support new conditions:

  • if the revision in the message is older than the one seen in the state then an operation corresponding to the state is emitted:
    • reconcile if the state is CREATED using the revision seen and fetch the data from this revision
    • delete if the state is DELETED
  • if the revision in the message is newer than the one seen in the state (or never seen) then an operation corresponding to the message is emitted:
    • reconcile if the message has a type create using the revision from the message
    • delete if the message has a type delete

AC: