Page MenuHomePhabricator

icinga UNKNOWN Varnishkafka Delivery Errors / varnishkafka data not in graphite
Closed, ResolvedPublic

Description

we have a lot of UNKNOWNs in Icinga from:

Varnishkafka Delivery Errors per minute

they are either "UNKNOWN: No valid datapoints found" or "UNKNOWN: More than half of the datapoints are undefined"

https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=8&hoststatustypes=3&serviceprops=2097162&nostatusheader


I reopened T76342 for this, but:

< ottomata> mutante: i think that is a new issue, that one was for check_ganglia. this one is from graphite, and the data is really not in graphite

Event Timeline

Dzahn created this task.Mar 17 2015, 3:07 PM
Dzahn raised the priority of this task from to Needs Triage.
Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: Dzahn.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 17 2015, 3:07 PM

same issue on T92967 but for HHVM things (graphite issue?)

MC8 added a subscriber: MC8.

confirmed it is a graphite issue, namely the carbon-relay queue is too small now, will followup with a fix

Change 197344 had a related patch set uploaded (by Filippo Giunchedi):
graphite: increase relay queue size

https://gerrit.wikimedia.org/r/197344

Change 197344 merged by Filippo Giunchedi:
graphite: increase relay queue size

https://gerrit.wikimedia.org/r/197344

fix merged, seems to have recovered, pending creation of related alarms

Change 197352 had a related patch set uploaded (by Filippo Giunchedi):
graphite: add error alerts

https://gerrit.wikimedia.org/r/197352

Change 197352 merged by Filippo Giunchedi:
graphite: add error alerts

https://gerrit.wikimedia.org/r/197352

fgiunchedi closed this task as Resolved.Mar 24 2015, 9:16 AM

alarms created, resolving