We'd much prefer to do T331283: [Event Platform] Store Flink HA metadata in Zookeeper, but until we have Zookeeper clusters with a newer version, we won't be able to.
As a stop gap, we should enable HA using k8s ConfigMaps, as the Search team does for rdf-streaming-udpater. Let's verify this with Search and SRE ServiceOps, but this should be better than running without HA.
- Enable Flink JobManager HA with state stored in ConfigMaps
- Documentation on how to redeploy jobs from previous savepoints