Page MenuHomePhabricator

Storage request: swift s3 bucket for flink search-update-pipeline checkpointing
Closed, ResolvedPublic

Description

Hi!

We would like to use an S3 backend for our flink-based search update pipeline. This is needed for persisting checkpoints, aka state of stateful operations inside the application, which in turn allows the application to seamlessly pick up where a predecessor left of (when being stopped for whatever reason).

In analogy to T330693 (s3 for flink-based enrichment application) we would need an account to store flink checkpoints and savepoints. This account would have access to three containers:

  • cirrussearch-update-pipeline-eqiad
  • cirrussearch-update-pipeline-codfw
  • cirrussearch-update-pipeline-staging

The storage needs for each container (excluding staging) would be 21G.

The storage needs for the staging container should be minimal as it will be used for staging deploys for which we'll probably cover only one test wiki so in total the account needs a storage quota twice as big as the values stated above.

This space is primarily used for operating flink, the loss of this state should not be terrible and can be solved by restarting the job from earlier kafka offsets (~5min).
See for the detailed estimation checkpoint storage estimation.

Thanks for looking and please let us know if you need more info.

Event Timeline

Change 949943 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] hiera: add swift user search_update_pipeline

https://gerrit.wikimedia.org/r/949943

Change 949944 had a related patch set uploaded (by MVernon; author: MVernon):

[labs/private@master] hiera: add fake credential for swift user search_update_pipeline

https://gerrit.wikimedia.org/r/949944

Change 949944 merged by MVernon:

[labs/private@master] hiera: add fake credential for swift user search_update_pipeline

https://gerrit.wikimedia.org/r/949944

Change 949943 merged by MVernon:

[operations/puppet@production] hiera: add swift user search_update_pipeline

https://gerrit.wikimedia.org/r/949943

Mentioned in SAL (#wikimedia-operations) [2023-08-18T09:13:21Z] <Emperor> roll-restart thanos swift frontends to add user T342620

MatthewVernon claimed this task.
MatthewVernon subscribed.

This is done now.