Followup of T302396.
The thanos-swift cluster is S3 compatible so we should use that instead of the native swift client which we customized to implement tmp auth and has been removed from the official flink distribution: https://issues.apache.org/jira/browse/FLINK-21819.
Migration plan:
- Preflight checks: Test that s3 actually fixes T302396
- deploy a new image with s3&swift enabled to codfw
- save a savepoint to s3 from the updater running in yarn and stop it (requires restarting this session cluster with S3 enabled)
- start the application from this s3 savepoint
- Migrate jobs from swift to s3
- codfw (wdqs traffic already pointing at eqiad)
- deploy a flink session cluster with s3+swift enabled (flink HA storage still pointing to swift)
- restart all jobs with a savepoint pointing at s3 and a checkpoint path using s3 as well
- eqiad
- [sre] update wikidata maxlag to only poll codfw machines: https://gerrit.wikimedia.org/r/c/operations/puppet/+/770508
- [sre] switch wdqs traffic to codfw (see commands at https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook#Remediation)
- deploy a flink session cluster with s3+swift enabled (flink HA storage still pointing to swift)
- restart all jobs with a savepoint pointing at s3 and a checkpoint path using s3 as well
- codfw (wdqs traffic already pointing at eqiad)
AC:
- W[DC]QS Streaming Updater is using thanos-swift through the s3 protocol