Page MenuHomePhabricator

Alert linting failure for Mailman
Closed, ResolvedPublic

Description

problem=prometheus "ops" at http://127.0.0.1:9900/ops has "node_files_total" metric with "path" label but there are no series matching {path=~".*/mailman3/queue/bounces"} in the last 1w

https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem&q=team%3Dsre&q=%40receiver%3Ddefault

Event Timeline

Jelto claimed this task.
Jelto triaged this task as Low priority.
Jelto subscribed.

I think this is related to the Prometheus incident T393357. The alert fired some time after there is a gap in the scraped metrics.

Looking at the last 7d the metric is available again and I can no longer see the alert in https://alerts.wikimedia.org/?q=alertname%3DAlertLintProblem&q=team%3Dsre&q=%40receiver%3Ddefault.

So I'll resolve this task.