Hi!
We would like to use an S3 backend for our flink-based search update pipeline. This is needed for persisting checkpoints, aka state of stateful operations inside the application, which in turn allows the application to seamlessly pick up where a predecessor left of (when being stopped for whatever reason).
In analogy to T330693 (s3 for flink-based enrichment application) we would need an account to store flink checkpoints and savepoints. This account would have access to three containers:
- cirrussearch-update-pipeline-eqiad
- cirrussearch-update-pipeline-codfw
- cirrussearch-update-pipeline-staging
The storage needs for each container (excluding staging) would be 21G.
The storage needs for the staging container should be minimal as it will be used for staging deploys for which we'll probably cover only one test wiki so in total the account needs a storage quota twice as big as the values stated above.
This space is primarily used for operating flink, the loss of this state should not be terrible and can be solved by restarting the job from earlier kafka offsets (~5min).
See for the detailed estimation checkpoint storage estimation.
Thanks for looking and please let us know if you need more info.