Page MenuHomePhabricator

icinga UNKNOWN Varnishkafka Delivery Errors / varnishkafka data not in graphite
Closed, ResolvedPublic

Description

we have a lot of UNKNOWNs in Icinga from:

Varnishkafka Delivery Errors per minute

they are either "UNKNOWN: No valid datapoints found" or "UNKNOWN: More than half of the datapoints are undefined"

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=8&hoststatustypes=3&serviceprops=2097162&nostatusheader


I reopened T76342 for this, but:

< ottomata> mutante: i think that is a new issue, that one was for check_ganglia. this one is from graphite, and the data is really not in graphite

Event Timeline

Dzahn raised the priority of this task from to Needs Triage.
Dzahn updated the task description. (Show Details)
Dzahn subscribed.

confirmed it is a graphite issue, namely the carbon-relay queue is too small now, will followup with a fix

Change 197344 had a related patch set uploaded (by Filippo Giunchedi):
graphite: increase relay queue size

https://gerrit.wikimedia.org/r/197344

Change 197344 merged by Filippo Giunchedi:
graphite: increase relay queue size

https://gerrit.wikimedia.org/r/197344

fix merged, seems to have recovered, pending creation of related alarms

Change 197352 had a related patch set uploaded (by Filippo Giunchedi):
graphite: add error alerts

https://gerrit.wikimedia.org/r/197352

Change 197352 merged by Filippo Giunchedi:
graphite: add error alerts

https://gerrit.wikimedia.org/r/197352

alarms created, resolving