At 10:04 UTC, the wikidata-monitoring email received an alert about “DispatchChanges Normal job backlog time (mean avg, 15min)”:
[1] Firing
Labels
alertname = DispatchChanges Normal job backlog time (mean avg, 15min) alert
alert_rule_uid = MF0FSjJ4z
contacts = "AlertManager","cxserver"
datasource_uid = 000000026
grafana_folder = Wikidata
ref_id = A
rule_uid = MF0FSjJ4z
severity = critical
team = wikidata
Annotations
alertId = 309
dashboardUid = TUJ0V-0Zk
orgId = 1
panelId = 28
grafana_state_reason = NoData
message = DispatchChanges job backlog is over 10 minutes! Normal values are between 0.5s and 1s
Source
According to another email received at 10:24 UTC, the alert was resolved, but the job in Grafana still doesn’t look good – the backlog time just cut off:
We should figure out what’s going on here, and if anything is still broken.