Page MenuHomePhabricator

Add means to upgrade the flink code even when incompatible serialization changes are involved
Closed, ResolvedPublic

Description

As wdqs maintainer I want a way to tranform a savepoint (drained) to a CSV file so that I can reuse the existing bootstrap job to resume a pipeline even when incompatible serialization changes are made.

  • add a job that
    • dumps a CSV file similar to the one created by org.wikidata.query.rdf.spark.EntityRevisionMapGenerator
    • dumps another CSV file kafka consumer offsets
  • adapt the UpdaterBootstrapJob to support setting consumer offsets

AC:

  • the pipeline can always be upgraded using this procedure:
    1. [old code]: stop&drain the pipeline storing a savepoint
    2. [old code]: tranform the savepoint to a set of CSV files
    3. [new code]: run the bootstrap job with the CSV files
    4. [new code]: resume the pipeline

Event Timeline

TJones renamed this task from Add a mean to upgrade the flink code even when incompatible serialization changes are involved to Add means to upgrade the flink code even when incompatible serialization changes are involved.Mar 8 2021, 4:45 PM

Change 665082 had a related patch set uploaded (by DCausse; owner: DCausse):
[wikidata/query/rdf@master] Add state extraction job

https://gerrit.wikimedia.org/r/665082

Change 665082 merged by jenkins-bot:

[wikidata/query/rdf@master] Add a state extraction job

https://gerrit.wikimedia.org/r/665082