Page MenuHomePhabricator

Alert when flink does not have the number of expected task managers
Closed, ResolvedPublic

Description

As a maintainer of a flink session cluster I want to be alerted when the number of taskmanagers is not what the deployment expects so that I can react quickly.

It may happen that k8s is preferring to reboot containers on a broken k8s node rather than migrate the pod to a new pod (see parent ticket), for k8s this deployment may appear to be working properly but for flink the resources it expects are not available and the job it's supposed to run will remain in the SCHEDULED state.

AC:

  • alert when the number of task managers is below a certain threshold