Followup of T302396.
The thanos-swift cluster is S3 compatible so we should use that instead of the native swift client which we customized to implement tmp auth and has been removed from the official flink distribution: https://issues.apache.org/jira/browse/FLINK-21819.
Migration plan:
- Preflight checks: Test that s3 actually fixes T302396
-- deploy a new image with s3&swift enabled to codfw
-- save a savepoint to s3 from the updater running in yarn and stop it (requires restarting this session cluster with S3 enabled)
-- start the application from this s3 savepoint
- Migrate jobs from swift to s3
-- deploy a flink session cluster with s3+swift enabled (flink HA storage still pointing to swift)
-- restart all jobs with a savepoint pointing at s3 and a checkpoint path using s3 as well
- Migrate flink HA storage from swift to s3
-- switch wdqs traffic&wikidata maxlag check to the spare DC
-- Stop all jobs from from the session cluster
-- undeploy all the k8s deployments under the `rdf-streaming-updater` namespace (dropping all flink generated configmaps might be necessary by e.g. recreating the k8s namespace)
-- delete the `flink_ha_storage` folder on the corresponding s3 bucket
-- deploy an updated version of the flink-session-cluster chart without the swift client
-- resume all jobs from their corresponding savepoints
-- do the same for the next DC
AC:
- WDQS Streaming Updater is thanos-swift through the s3 protocol