Page MenuHomePhabricator

Enable HA failover for flink-kubernetes-operator
Closed, ResolvedPublic

Description

flink-kubernetes-operator supports running multiple replicas in HA standby mode.

https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/configuration/#leader-election-and-high-availability

We should enable this in DSE, and run with two or 3 replicas. Once we do this for a little while, we should do the same as we deploy in wikikube in T333464.

Running without HA in normal operation is fine
https://phabricator.wikimedia.org/T324576#8454404

But we should enable HA standby replicas for good measure anyway.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 917389 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] flink-operator - enable HA leader election and set replicas: 2

https://gerrit.wikimedia.org/r/917389

Change 917389 merged by jenkins-bot:

[operations/deployment-charts@master] flink-operator - enable HA leader election and set replicas: 2

https://gerrit.wikimedia.org/r/917389

Hm, looks like i'm going to need to add (or enable?) some perms to manage leases in the flink-operator namespace:

User \"system:serviceaccount:flink-operator:flink-operator\" cannot get resource \"leases\" in API group \"coordination.k8s.io\" in the namespace \"flink-operator\"

Change 917907 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] flink-operator - disable HA replicas for now

https://gerrit.wikimedia.org/r/917907

Change 917907 merged by jenkins-bot:

[operations/deployment-charts@master] flink-operator - disable HA replicas for now

https://gerrit.wikimedia.org/r/917907

Change 921064 had a related patch set uploaded (by TChin; author: TChin):

[operations/deployment-charts@master] Allow managing leases in flink-operator namespace when using HA

https://gerrit.wikimedia.org/r/921064

Change 921064 merged by jenkins-bot:

[operations/deployment-charts@master] Allow managing leases in flink-operator namespace when using HA

https://gerrit.wikimedia.org/r/921064

Change 921067 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/deployment-charts@master] flink-operator - enable HA leader election and set replicas: 2

https://gerrit.wikimedia.org/r/921067

Change 921067 merged by Ottomata:

[operations/deployment-charts@master] flink-operator - enable HA leader election and set replicas: 2

https://gerrit.wikimedia.org/r/921067

Change 925016 had a related patch set uploaded (by TChin; author: TChin):

[operations/deployment-charts@master] Fix overlapping names edge case in flink-operator

https://gerrit.wikimedia.org/r/925016

Change 925016 abandoned by TChin:

[operations/deployment-charts@master] Fix overlapping names edge case in flink-operator

Reason:

Abandoned in favor of waiting for upstream fix to be released

https://gerrit.wikimedia.org/r/925016