Page MenuHomePhabricator

maintain-kubeusers: metrics, monitoring and alerting
Closed, ResolvedPublic

Description

Following the deployment of the latest refactor of maintain-kubeusers in T364312: [maintain-kubeusers,infra,k8s]: introduce some logic to backfill maintain-kubeuser resources (like per-tool kyverno policies), we need to make sure we have the metrics, monitoring and alerting that we need.

As of this writing, I don't think we even have any alert if the daemon is not running.

Event Timeline

dcaro triaged this task as High priority.Jun 4 2024, 1:50 PM

aborrero opened https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/14

maintain-kubeusers: MaintainKubeusersHang: adjust alert 'for' value

aborrero merged https://gitlab.wikimedia.org/repos/cloud/toolforge/alerts/-/merge_requests/14

maintain-kubeusers: MaintainKubeusersHang: adjust alert 'for' value