Page MenuHomePhabricator

Graphite generates a lot of 502 in Grafana
Closed, ResolvedPublic

Description

A lot of the performance team Grafana dashboards aren't working correctly because of the dashboards gets 502:s for queries in Graphite. I'm not sure when this started to happen, but I know it has been like this for at least before the latest Grafana update. Probably a couple of months at least.

This happens on large dashboards like https://grafana.wikimedia.org/d/000000491/webpagereplay-desktop-alerts but sometimes on smaller ones like https://grafana.wikimedia.org/d/000000210/webpagetest .

It can be that a panel doesn't work, and sometimes one of the queries in the panel. Refreshing the dashboard makes other dashboards fail. It looks like this in devtools:

Event Timeline

Peter created this task.Dec 12 2018, 6:44 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 12 2018, 6:44 AM
Dzahn triaged this task as Medium priority.Dec 13 2018, 10:23 PM
Dzahn added a project: Graphite.

I can confirm I'm getting 502/503 from those dashboards every now and then. I suspect this being related to having changed graphite datasource in grafana from "direct" (i.e. the browser hits graphite.w.o directly) to "proxy" (the browser talks to grafana-server that talks to graphite).

I can't see anything obvious from /var/log/grafana.log though /var/log/apache2/other_vhosts_access.log has indeed the 502s: (client ip address anonimized):

2018-12-14T09:29:45	562490	10.64.48.101	proxy-server/502	232	POST	http://grafana.wikimedia.org/api/datasources/proxy/1/render	-	text/html	https://grafana.wikimedia.org/d/000000491/webpagereplay-desktop-alerts?orgId=1	127.0.0.2	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36	en-US,en;q=0.9	-	-	10.64.48.101
2018-12-14T09:29:45	657824	10.64.48.101	proxy-server/502	232	POST	http://grafana.wikimedia.org/api/datasources/proxy/1/render	-	text/html	https://grafana.wikimedia.org/d/000000491/webpagereplay-desktop-alerts?orgId=1	127.0.0.2	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36	en-US,en;q=0.9	-	-	10.64.48.101
2018-12-14T09:29:45	647154	10.64.16.22	proxy-server/502	232	POST	http://grafana.wikimedia.org/api/datasources/proxy/1/render	-	text/html	https://grafana.wikimedia.org/d/000000491/webpagereplay-desktop-alerts?orgId=1	127.0.0.2	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36	en-US,en;q=0.9	-	-	10.64.16.22
2018-12-14T09:29:45	647301	10.64.48.103	proxy-server/502	232	POST	http://grafana.wikimedia.org/api/datasources/proxy/1/render	-	text/html	https://grafana.wikimedia.org/d/000000491/webpagereplay-desktop-alerts?orgId=1	127.0.0.2	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36	en-US,en;q=0.9	-	-	10.64.48.103
2018-12-14T09:29:45	648355	10.64.16.22	proxy-server/502	232	POST	http://grafana.wikimedia.org/api/datasources/proxy/1/render	-	text/html	https://grafana.wikimedia.org/d/000000491/webpagereplay-desktop-alerts?orgId=1	127.0.0.2	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36	en-US,en;q=0.9	-	-	10.64.16.22
2018-12-14T09:29:45	648919	10.64.48.101	proxy-server/502	232	POST	http://grafana.wikimedia.org/api/datasources/proxy/1/render	-	text/html	https://grafana.wikimedia.org/d/000000491/webpagereplay-desktop-alerts?orgId=1	127.0.0.2	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36	en-US,en;q=0.9	-	-	10.64.48.101
2018-12-14T09:29:45	732708	10.64.32.67	proxy-server/502	232	POST	http://grafana.wikimedia.org/api/datasources/proxy/1/render	-	text/html	https://grafana.wikimedia.org/d/000000491/webpagereplay-desktop-alerts?orgId=1	127.0.0.2	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36	en-US,en;q=0.9	-	-	10.64.32.67

The 502s are also present on graphite1004 from wsgi-handler in /var/log/apache2/other_vhosts_access.log:

2018-12-14T09:29:46     664     10.64.48.103    uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.16.22, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.48.103
2018-12-14T09:29:46     791     10.64.16.22     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.0.130, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.16.22
2018-12-14T09:29:46     559     10.64.0.130     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.48.103, 127.0.0.1, 127.0.0.1, 2620:0
:861:103:a800:ff:fe01:5b3a Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.0.130
2018-12-14T09:29:46     853     10.64.16.22     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.16.24, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.16.22
2018-12-14T09:29:46     691     10.64.32.69     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.32.67, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.32.69
2018-12-14T09:29:46     500     10.64.32.69     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.32.69, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.32.69
2018-12-14T09:29:46     566     10.64.48.101    uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.48.101, 127.0.0.1, 127.0.0.1, 2620:0
:861:103:a800:ff:fe01:5b3a Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.48.101
2018-12-14T09:29:46     548     10.64.32.69     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.0.130, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.32.69
2018-12-14T09:29:46     215     10.64.48.101    uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.32.69, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.48.101
2018-12-14T09:29:46     195     10.64.32.69     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.48.101, 127.0.0.1, 127.0.0.1, 2620:0
:861:103:a800:ff:fe01:5b3a Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.32.69
2018-12-14T09:29:46     273     10.64.0.132     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.48.101, 127.0.0.1, 127.0.0.1, 2620:0
:861:103:a800:ff:fe01:5b3a Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.0.132
2018-12-14T09:29:46     187     10.64.16.24     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.0.130, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.16.24
2018-12-14T09:29:46     179     10.64.0.130     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.16.22, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.0.130
2018-12-14T09:29:46     220     10.64.32.67     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.0.130, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.32.67
2018-12-14T09:29:46     338     10.64.48.103    uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.16.22, 127.0.0.1, 127.0.0.1, 2620:0:
861:103:a800:ff:fe01:5b3a  Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.48.103
2018-12-14T09:29:46     243     10.64.32.67     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.48.101, 127.0.0.1, 127.0.0.1, 2620:0
:861:103:a800:ff:fe01:5b3a Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.32.67
2018-12-14T09:29:46     236     10.64.32.69     uwsgi-handler/502       232     POST    http://graphite.wikimedia.org/render    -       text/html       -       127.0.0.2, 10.64.48.103, 127.0.0.1, 127.0.0.1, 2620:0
:861:103:a800:ff:fe01:5b3a Grafana/5.4.2   en-US,en;q=0.9  -       -       10.64.32.69
CDanis added a subscriber: CDanis.Dec 17 2018, 5:19 PM

We could try changing the Grafana datasource back to 'direct' (now called 'Browser' in 5.x) -- it was flipped to server aka proxy just because of CORS requirements when grafana-beta.wikimedia.org was being tested.

I'm not sure that would affect 502s being served by Graphite itself though?

We could try changing the Grafana datasource back to 'direct' (now called 'Browser' in 5.x) -- it was flipped to server aka proxy just because of CORS requirements when grafana-beta.wikimedia.org was being tested.

I'm not sure that would affect 502s being served by Graphite itself though?

Indeed it might not make a difference proxy vs redirect for 502s from graphite.

Worth a try IMO flipping back to direct and see if that improves things. One thing I can think of making a difference from graphite/uwsgi POV is concurrency from multiple clients (varnish) vs a single client (grafana), plausible but a little far fetched perhaps! If it did make a difference it would be a bummer IMHO as having Grafana as the single entry point for metrics requests seems attractive to me.

Agreed it's worth a try. Graphite datasource updated.

I've only managed to reproduce a single 502 myself with datasource in proxy mode.

However I can't get it to happen at all with datasource in direct/browser mode.

Pretty confusing; not sure what could be causing this.

@Peter please let me know if you can still repro the 502s with the datasource in 'browser' mode.

Is it possible that this correlates with the upgrade to graphite 1.0.2? Looks like it happened early November: T166173#4725067 see also T196484.

Peter added a comment.Jan 7 2019, 7:18 PM

Sorry back from vacation today. Yep, when I went through the dashboards today, I saw that for some of them there where only some metrics showing. For example we test three URLS under certain conditions, but only one of them are showed up in the graph. BUT when I tried to get it now (and watching the network log) I couldn't reproduce. I was throttling my connection running other so it could have been that I was running on a really slow connection and they timed out, hmm. Let me continue to see if I can reproduce tomorrow.

Peter closed this task as Resolved.Jan 9 2019, 12:28 PM
Peter claimed this task.

I cannot reproduce now, seems to be fixed, thank you @CDanis and @fgiunchedi !