Generating the initial state of the wdqs streaming update requires parsing the TTL dumps (all and lexemes). On the first start the kafka consumer over mediawiki.revision-create needs to be positioned to the offsets related to the time the dump was started so that it can capture everything that could have been created after something was written in the dump.
The way we generate the dump and the time required to make it available in HDFS makes it difficult to work within the current 7 days retention period.
As a first test we plan to use jumbo, increasing the retention on this topic there up to 30 days would ease our ability to start testing the flink pipeline.