Page MenuHomePhabricator

grafana-labs often fails to generate graphs with c.datapoints is undefined
Closed, ResolvedPublic

Description

When generating graphs, there is often a red exclamation mark and the graph fails to render. An example is the per labs project dashboard eg:

https://grafana-labs.wikimedia.org/dashboard/db/labs-project-board?orgId=1&var-project=integration&from=now-7d&to=now

Clicking on the CPU raw, for example, exacerbates the issue: most of the graphs do not render. I get the same behavior even when filtering by host and asking to simply render cpu and memory graphs (example).

Clicking on the red exclamation mark leads to an inspection panel offering more details:

Message: c.datapoints is undefined

Stack trace:

c/this.convertDataPointsToMs@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:15:20614
i@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:61:3972
l/<@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:61:4394
zc/this.$get</o.prototype.$eval@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:61:11545
zc/this.$get</o.prototype.$digest@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:61:10014
zc/this.$get</o.prototype.$apply@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:61:11830
x/i<@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:61:7880
f@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:59:22130
pb/k.defer/c<@https://grafana-labs.wikimedia.org/public/app/boot.e4836696.js:59:23593

Event Timeline

hashar created this task.Oct 9 2017, 8:45 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 9 2017, 8:45 AM
Gilles added a subscriber: Peter.Oct 9 2017, 12:33 PM
Gilles added a subscriber: Gilles.

It's Graphite intermittently serving junk to Grafana. It's a PNG image instead of JSON being sent back. I've seen that before, but I can't remember if it was production Graphite or that I'm just rediscovering the same issue on Labs Graphite again...

Yes, we run into the same problem occasionally in production: https://phabricator.wikimedia.org/T153169#3042185, but it's really painful on Labs Graphite/Grafana.

+ @fgiunchedi
At least we seem to run the latest Grafana (4.5.2).

It's Graphite intermittently serving junk to Grafana. It's a PNG image instead of JSON being sent back. I've seen that before, but I can't remember if it was production Graphite or that I'm just rediscovering the same issue on Labs Graphite again...

Indeed. I see POST to /render they have format: "json" but some end up serving Content-Type: "image/png".

Based on that, there is graphite-web: sometimes format=json requests return png's #576.

Apparently that is fixed in 0.9.x by https://github.com/graphite-project/graphite-web/pull/764 . If I got it right, the commit for 0.9.x is https://github.com/graphite-project/graphite-web/commit/2886ace93432aa490410faf18f993fd704266cc0

$ git tag --contains 2886ace93432aa4904
0.9.13-pre1
0.9.14
0.9.15
0.9.16
*   2886ace9 - Merge pull request #764 from esc/0.9.x-caching-request-hashfix (3 years, 3 months ago) <Valentin Haenel>
|\  
| * 34894bf7 - prune trailing whitespace (3 years, 3 months ago) <Valentin Haenel>
| * a6fcd013 - Removing stripControlChars function, as it's no longer needed. (3 years, 3 months ago) <rob>
| * 970f3770 - Adding tests for the graphite.render.hashing.hashRequest function. (3 years, 3 months ago) <rob>
| * 576ec1a5 - Backport of c5d9e7e PR719 to 0.9.x (3 years, 3 months ago) <Dave Ertel>
|/

Assuming grafana-labs.wikimedia.org runs on Jessie. graphite-web is 0.9.12 (https://packages.debian.org/search?keywords=graphite-web) and thus lack the fix up.

The merge includes 4 commits. Maybe it is sufficient to patch graphite-web with simply https://github.com/graphite-project/graphite-web/commit/576ec1a5cd4d38ec20230794d3b7017f105ffa22

--- a/webapp/graphite/render/hashing.py
+++ b/webapp/graphite/render/hashing.py
@@ -19,12 +19,13 @@ try:
   from hashlib import md5
 except ImportError:
   from md5 import md5
+from itertools import chain
 import bisect
 
 def hashRequest(request):
   # Normalize the request parameters so ensure we're deterministic
   queryParams = ["%s=%s" % (key, '&'.join(values))
-                 for (key,values) in request.GET.lists()
+                 for (key,values) in chain(request.POST.lists(), request.GET.lists())
                  if not key.startswith('_')]
 
   normalizedParams = ','.join( sorted(queryParams) ) or 'noParam'

That adds POST parameters to the caching key.

hashar triaged this task as Medium priority.Oct 11 2017, 9:45 AM

Nice find!

graphite-web is in sid and buster, but not in stretch? https://packages.debian.org/jessie/graphite-web otherwise I would have suggested to update the WMCS Graphite box to stretch

Applying the patch in the meantime sounds like a good idea.

If that doesn't work, a workaround is to disable that layer of caching: https://github.com/graphite-project/graphite-web/issues/576#issuecomment-36246428

Mentioned in SAL (#wikimedia-operations) [2017-10-11T13:03:20Z] <godog> test graphite-web patch on labmon1001 - T177747

Nice find indeed @hashar !

@Gilles yeah looks like graphite-web was removed from stretch because of django incompatibility https://packages.qa.debian.org/g/graphite-web/news/20170114T163927Z.html

I hotfixed with the patch above in labmon1001 since the issue is severe enough. We have graphite-web 0.9.15+debian-2 uploaded to jessie-wikimedia but it isn't used anywhere in labs or production. We'd need to minimally test it on a separate instance in labs to verify it works fine.

Hurrah it works fine on https://grafana-labs.wikimedia.org/dashboard/db/labs-project-board?orgId=1&var-project=integration&from=now-7d&to=now

So I guess we can rebuild a graphite 0.9.12 debian package with the patch added and that would be the end of it ? :-)

Hurrah it works fine on https://grafana-labs.wikimedia.org/dashboard/db/labs-project-board?orgId=1&var-project=integration&from=now-7d&to=now

So I guess we can rebuild a graphite 0.9.12 debian package with the patch added and that would be the end of it ? :-)

We have at least two tasks to upgrade graphite T166173 and T119774 so it'd be better to do that instead, I am not going to have time this quarter for sure to look after it though. If someone could test the new version on a labs instance, then I can help getting the package uploaded internally and test on labmon then in production (cc @Addshore @Krinkle since they were interested too)

hashar changed the task status from Open to Stalled.Nov 20 2017, 1:52 PM

Worked around by cherry picking https://github.com/graphite-project/graphite-web/commit/576ec1a5cd4d38ec20230794d3b7017f105ffa22 on the graphite installation we have.

The proper fix would be included when we upgrade Graphite to 0.9.14 or later.

Krinkle removed a subscriber: Krinkle.Nov 20 2017, 10:24 PM
Addshore changed the status of subtask T119774: Upgrade graphite to 0.9.15 from Open to Stalled.Nov 21 2017, 11:07 AM
fgiunchedi closed this task as Resolved.Dec 3 2018, 11:29 AM
fgiunchedi claimed this task.

tentatively resolving, graphite 0.9.15 is on labmon1001 (jessie) while production runs graphite 1.x on stretch

tentatively resolving, graphite 0.9.15 is on labmon1001 (jessie) while production runs graphite 1.x on stretch

@fgiunchedi Is there a task to update labmon to 1.x? I know why prod is on a newer version, but I can see that tripping up future work.

tentatively resolving, graphite 0.9.15 is on labmon1001 (jessie) while production runs graphite 1.x on stretch

@fgiunchedi Is there a task to update labmon to 1.x? I know why prod is on a newer version, but I can see that tripping up future work.

AFAIK there isn't a task to upgrade labmon to 1.x (either keeping jessie or upgrading to stretch), I'm looping in cloud-services-team though for awareness.

hashar added a comment.Dec 4 2018, 7:18 PM

That definitely got solved when we got the hotfix cherry picked. IIRC we kept this task open to make sure the graphite Debian package get upgrade. Eventually I guess we now pin the version from jessie-wikimedia/backports:

graphite-web:
  Installed: (none)
  Candidate: 0.9.15+debian-2
  Version table:
     0.9.15+debian-2 0
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/backports amd64 Packages
     0.9.12+debian-6 0
        500 http://mirrors.wikimedia.org/debian/ jessie/main amd64 Packages

The issue was only affecting labs afaik (labmon1001)