Page MenuHomePhabricator

Add support for schema evolution of serialized objects
Closed, ResolvedPublic

Description

As a maintainer of the wdqs streaming updater I want all the objects serialized in the pipeline to support schema upgrades so that I don't have to use the StateExtractionJob to build a savepoint compatible with the next version of the pipeline.

Initially I thought that using a combination of stop&drain + StateExtractionJob + UpdaterBootstrapJob would be sufficient for dealing with incompatible serialization changes but this is not totally true because the AsyncIO operator holds its in-flight input & output events in a state that is not emptied when draining the pipeline (not driven by timers).

StateExtractionJob still has values for extraordinary circumstances (debug, unrecoverable serialization bug, complete refactor), but it is not robust/easy enough to be used for regular upgrades.

AC:

  • decide what serialization format to use (custom or avro)
    • custom since we want to keep case classes using a serialization engine would require an extra transformation step.
  • pipeline can be upgraded event with schema changes without relying on StateExtractionJob

Event Timeline

Change 698567 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] Add custom serializers for stored objects

https://gerrit.wikimedia.org/r/698567

dcausse moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.
dcausse updated the task description. (Show Details)
dcausse triaged this task as Medium priority.Jun 7 2021, 4:23 PM

Change 698567 merged by jenkins-bot:

[wikidata/query/rdf@master] Add custom serializers for stored objects

https://gerrit.wikimedia.org/r/698567