As a wdqs maintainer I would like to be able to backfill without human intervention the revision-create topics when it contains data produced from multiple DCs.
During a DC switch change-prop will switch to another DC and will produce its data to another kafka topic. In normal situation this is not a problem as the backlog should be pretty low and the ability to mark sources as "idle" and the work done in T258687 will make the flink pipeline works flawlessly during DC switches.
The problem arises when the backlog is large (which happens if you want to backfill, e.g. when starting the pipeline for the first time or if it was stopped for multiple days).
In situation after a switch from eqiad to codfw the topic eqiad.mediawiki.revision-create may contain data needed for a backfill that is several days older than the first event available on codfw.mediawiki.revision-create. The events from codfw.mediawiki.revision-create will be held in their respective time window while waiting for the eqiad.mediawiki.revision-create to be aligned with codfw.mediawiki.revision-create (fully drained and marked idle or on the first event newer than the ones read from codfw). When this happens all the windows created while buffering codfw.mediawiki.revision-create events will be fired at the same time.
This also may create a relatively large state on the event reordering operator (up to 200Mb on a backlog of 2.65M from eqiad + 22M from codfw). With our current settings checkpointing fails after our 15mins time out. Progress is still made so it's possible that the backfill may succeed after sufficient retries.
The problem here is that flink is not aware of our DC switch strategy and there is no need to start reading data from codfw.mediawiki.revision-create if eqiad.mediawiki.revision-create is not fully drained (and vice versa).
Note that this problem is only relevant if we want to backfill a large backlog created during a DC switch and it might not be worth investigating/implementing possible solutions if we do not want to backfill in such scenarios.
One possible solution could be to write a custom source that consumer based on the event timestamp only one DC queue at a time.