Page MenuHomePhabricator

Increase visibility of kubernetes network status
Open, Needs TriagePublic

Description

Context: Follow-up of incident from 2024-02-07 (newly added kubernetes nodes missing BGP configuration).

We do not have good visibility into the network of kubernetes clusters.

  • We already collect some metrics from Calico components, we should make them more visible by adding key metrics to dashboards and possibly alerts
  • We do not have any metrics related to BGP sessions. These are not available in calico "open core", so we probably want to run bird-exporter.
    • Specific for situations like that incident: nodes pooled in pybal vs BGP status would be useful to have