Page MenuHomePhabricator

Ganglia varnishkafka python module crashing repeatedly
Closed, ResolvedPublic

Description

The following exception is being recurringly thrown by /usr/lib/ganglia/python_modules/varnishkafka.py:

Dec 01 11:53:45 cp4008 gmond[18667]: [PYTHON] Can't call the metric handler function for [kafka.rdkafka.brokers.kafka1012-eqiad-wmnet:9092.12.rtt.cnt] in the python module [varnishkafka].
Dec 01 11:53:45 cp4008 gmond[18667]: Traceback (most recent call last):
Dec 01 11:53:45 cp4008 gmond[18667]: File "/usr/lib/ganglia/python_modules/varnishkafka.py", line 346, in metric_handler
Dec 01 11:53:45 cp4008 gmond[18667]: varnishkafka_stats.update_stats()
Dec 01 11:53:45 cp4008 gmond[18667]: File "/usr/lib/ganglia/python_modules/varnishkafka.py", line 264, in update_stats
Dec 01 11:53:45 cp4008 gmond[18667]: if self.have_stats_changed_since_last_update():
Dec 01 11:53:45 cp4008 gmond[18667]: File "/usr/lib/ganglia/python_modules/varnishkafka.py", line 316, in have_stats_changed_since_last_update
Dec 01 11:53:45 cp4008 gmond[18667]: if self.flattened_stats[key] != self.flattened_stats_previous[key]:
Dec 01 11:53:45 cp4008 gmond[18667]: KeyError: 'kafka.varnishkafka.time'

Except for the problem itself, this might easily cause disk space issues by flooding /var/log/daemon.log, the exception is reported *very* frequently.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ema triaged this task as High priority.Dec 1 2016, 12:59 PM

Change 324708 had a related patch set uploaded (by Elukey):
Remove Ganglia monitoring for Varnishkafka

https://gerrit.wikimedia.org/r/324708

Change 324708 merged by Elukey:
Remove Ganglia monitoring for Varnishkafka

https://gerrit.wikimedia.org/r/324708

Next step is to check if we can use logster for statsv metrics (and then probably ask to the Performance team). Going to work on it tomorrow!

Change 324877 had a related patch set uploaded (by Elukey):
Refactor the monitor namespace to include Statsd

https://gerrit.wikimedia.org/r/324877

Change 324877 merged by Elukey:
Refactor the monitor namespace to include Statsd

https://gerrit.wikimedia.org/r/324877

Change 324880 had a related patch set uploaded (by Elukey):
Add a separate parameter for the statsd port

https://gerrit.wikimedia.org/r/324880

Change 324880 abandoned by Elukey:
Add a separate parameter for the statsd port

https://gerrit.wikimedia.org/r/324880

Change 324883 had a related patch set uploaded (by Elukey):
Switch Varnishkafka monitoring from Ganglia to statsd

https://gerrit.wikimedia.org/r/324883

Change 324887 had a related patch set uploaded (by Elukey):
Fix ganglia/statsd class namespace

https://gerrit.wikimedia.org/r/324887

Change 324887 merged by Elukey:
Fix ganglia/statsd class namespace

https://gerrit.wikimedia.org/r/324887

Change 324890 had a related patch set uploaded (by Elukey):
Fix class dependency for ganglia/statsd monitoring

https://gerrit.wikimedia.org/r/324890

Change 324890 merged by Elukey:
Fix class dependency for ganglia/statsd monitoring

https://gerrit.wikimedia.org/r/324890

Change 324891 had a related patch set uploaded (by Elukey):
Fix statsd logster job name

https://gerrit.wikimedia.org/r/324891

Change 324891 merged by Elukey:
Fix statsd logster job name

https://gerrit.wikimedia.org/r/324891

elukey added a project: Analytics-Kanban.
elukey moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 324883 merged by Elukey:
Switch Varnishkafka monitoring from Ganglia to statsd

https://gerrit.wikimedia.org/r/324883

Merged my changes for varnishkafka statsd monitoring, and Ema cleaned up via salt all the varnishkafka-related ganglia configurations.

(needs to be moved to the right column of analytics kanban)