As wdqs maintainer I want a way to tranform a savepoint (drained) to a CSV file so that I can reuse the existing bootstrap job to resume a pipeline even when incompatible serialization changes are made.
- add a job that
- dumps a CSV file similar to the one created by org.wikidata.query.rdf.spark.EntityRevisionMapGenerator
- dumps another CSV file kafka consumer offsets
- adapt the UpdaterBootstrapJob to support setting consumer offsets
AC:
- the pipeline can always be upgraded using this procedure:
- [old code]: stop&drain the pipeline storing a savepoint
- [old code]: tranform the savepoint to a set of CSV files
- [new code]: run the bootstrap job with the CSV files
- [new code]: resume the pipeline