Per T331283 and related tickets, Data Engineering (in consultation with Event Platform and Service Ops) has settled on Zookeeper to implement Flink HA, as described at Flink's website .
AC:
- Puppet code committed and working
- Alerting/dashboards created and working