The Search Update Pipeline was architected ~8 years ago. It has served its purpose well, but now is time to review its architecture and address a few of the long lived limitations it has. Design document [[ https://docs.google.com/document/d/17tY05WoaT_BloTzaIncR939k3hvhcVQ-E-8DBjo284E | here ]].
The overall plan is available in https://docs.google.com/document/d/17tY05WoaT_BloTzaIncR939k3hvhcVQ-E-8DBjo284EHigh level plan:
- [] Test the updater job on the dse-k8s cluster
-- [X] create a namespace for the cirrus-streaming-updater on the dse-k8s cluster: https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/951960
-- [X] T328675 create a helmfile service using the FlinkDeployment resource via the flink-app helm chart
-- [] T341792 Provision Zookeeper Cluster for storing Flink HA data
-- [] T344614 Add Zookeeper config to 'cirrus-streaming-updater' test service on DSE cluster
-- [] test various maintenance operations: taking savepoint, job upgrade, H/A recoveries (kill pods manually), k8s upgrade (wipe out the namespace, T293063), ... (see also T328561)
- [] Enable the k8s-operator on the staging wikikube cluster for the `cirrus-streaming-updater` namespace (might need a dedicated task)
-- [] test various maintenance operations on staging wk: taking savepoint, job upgrade, H/A recoveries (kill pods manually), k8s upgrade (wipe out the namespace, which is easier to maintain and comment on than this ticket.T293063), Once the plan is stable enough,... it will be documented as phab tasks for execution.(see also T328561)
- [] Enable the k8s-operator on the production wikikube cluster for the `cirrus-streaming-updater` namespace (might need a dedicated task)