The Search Update Pipeline was architected ~8 years ago. It has served its purpose well, but now is time to review its architecture and address a few of the long lived limitations it has. Design document here.
High level plan:
- Test the updater job on the dse-k8s cluster
- create a namespace for the cirrus-streaming-updater on the dse-k8s cluster: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/951960
- T328675 create a helmfile service using the FlinkDeployment resource via the flink-app helm chart
- T341792 Provision Zookeeper Cluster for storing Flink HA data
- T344614 Add Zookeeper config to 'cirrus-streaming-updater' test service on DSE cluster
- in progress test various maintenance operations for Flink Operator: taking savepoint, job upgrade, H/A recoveries (kill pods manually), k8s upgrade (wipe out the namespace, T293063), ... (see also T328561)
- Enable the k8s-operator on the staging wikikube cluster for the cirrus-streaming-updater namespace (might need a dedicated task)
- Enable the k8s-operator on the production wikikube cluster for the cirrus-streaming-updater namespace (might need a dedicated task)