Graphite error causing breakage of Graphite-backed Grafana dashboards
Closed, ResolvedPublic

Description

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python2.7/dist-packages/graphite/metrics/views.py", line 164, in find_view
    matches = list( store.find(query) )
  File "/usr/lib/python2.7/dist-packages/graphite/storage.py", line 54, in find
    for match in self.find_all(query):
  File "/usr/lib/python2.7/dist-packages/graphite/storage.py", line 116, in find_all
    for match in request.get_results():
  File "/usr/lib/python2.7/dist-packages/graphite/remote_storage.py", line 90, in get_results
    resultNodes = [ RemoteNode(self.store, node['metric_path'], node['isLeaf']) for node in results ]
KeyError: 'metric_path'

Seen on https://graphite.wikimedia.org/metrics/find?query=servers.cp*

Experienced on dashboards such as https://grafana.wikimedia.org/dashboard/db/varnish-traffic?orgId=1

Gilles created this task.Nov 5 2018, 8:28 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 5 2018, 8:28 PM
fgiunchedi added subscribers: colewhite, fgiunchedi.

Indeed this was caused by adding graphite1004 into rotation and reverted in https://gerrit.wikimedia.org/r/c/operations/puppet/+/471801 we'll investigate more on what's the root cause

uwsgi is throwing a 500, this is the related Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=878208

Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]: Traceback (most recent call last):
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:   File "/usr/lib/python2.7/dist-packages/django/core/handlers/wsgi.py", line 157, in __call__
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:     response = self.get_response(request)
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:   File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 124, in get_response
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:     response = self._middleware_chain(request)
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:   File "/usr/lib/python2.7/dist-packages/django/core/handlers/exception.py", line 43, in inner
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:     response = response_for_exception(request, exc)
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:   File "/usr/lib/python2.7/dist-packages/django/core/handlers/exception.py", line 93, in response_for_exception
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:   File "/usr/lib/python2.7/dist-packages/django/core/handlers/exception.py", line 143, in handle_uncaught_exception
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:     return callback(request, **param_dict)
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:   File "/usr/lib/python2.7/dist-packages/graphite/views.py", line 11, in server_error
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:     return HttpResponseServerError( template.render(context) )
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:   File "/usr/lib/python2.7/dist-packages/django/template/backends/django.py", line 64, in render
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:     context = make_context(context, request, autoescape=self.backend.engine.autoescape)
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:   File "/usr/lib/python2.7/dist-packages/django/template/context.py", line 287, in make_context
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]:     raise TypeError('context must be a dict rather than %s.' % context.__class__.__name__)
Nov 06 09:52:23 graphite1004 uwsgi-graphite-web[65773]: TypeError: context must be a dict rather than Context.

This is already fixed upstream by https://github.com/graphite-project/graphite-web/commit/b602664916c0529be0414f441d207562381179e7#diff-b796228785968655db8f13d80b2d50e6 and indeed applying that fixes the issue.

Also /var/lib/graphite-web/graphite.db ownership was wrong (root:root vs www-data:www-data) and now API queries are replied as expected

Change 471930 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] graphite: run graphite-auth as www-data

https://gerrit.wikimedia.org/r/471930

Change 471930 merged by Filippo Giunchedi:
[operations/puppet@production] graphite: run graphite-auth as www-data

https://gerrit.wikimedia.org/r/471930

Change 471935 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] graphite: fix syncdb exec resource

https://gerrit.wikimedia.org/r/471935

Change 471935 merged by Filippo Giunchedi:
[operations/puppet@production] graphite: fix syncdb exec resource

https://gerrit.wikimedia.org/r/471935

Change 471955 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] graphite: probe /metrics/find too

https://gerrit.wikimedia.org/r/471955

Change 471955 merged by Filippo Giunchedi:
[operations/puppet@production] graphite: probe /metrics/find too

https://gerrit.wikimedia.org/r/471955

Mentioned in SAL (#wikimedia-operations) [2018-11-09T13:21:31Z] <godog> upload graphite-web_1.0.2+debian-2.1wmf1 to stretch-wikimedia - T208782

fgiunchedi closed this task as Resolved.Nov 9 2018, 1:23 PM

The patched graphite-web version has been uploaded.

Sent Debian the related change: https://salsa.debian.org/debian-graphite-team/graphite-web/merge_requests/1#note_51018