Page MenuHomePhabricator

Add new Graphite instance in Grafana
Closed, ResolvedPublic

Description

Today with the current version of our synthetic testing WebPageTest and Browsertime we run our test agent on AWS because the AWS gives us the most stable metrics. We send metrics through statsv to Graphite, so we have limited the number of metrics that we send (for WebPageReplay/Browsertime we send 17 metrics) ... but with the tools we use there are so much more metric that we can use. For one tested URL there are 500+ metrics that we get automatically and graph. Storing those metrics gives us a bigger picture of whats going on + makes it easier to investigate regressions.

We have a new Graphite server up and running on AWS where we want to store only synthetic metrics that is generated on the outside of our network. That also gives us annotations per test as in T156252 without overloading anyone else's annotations as in T175708

We need to add it as an datasource to https://grafana.wikimedia.org, need to look into how we do that the correct way.

Event Timeline

Peter triaged this task as Medium priority.Sep 3 2019, 6:06 PM

@CDanis I wanna try add an external Graphite data source but I don't have sufficient privileges in Grafana admin to add data a sources. I would wanna try it that way so I can see that it works (need to fine tune the security group for that instance so traffic can come through). Can you give me access or what's the correct way to do it?

Adding datasources via the UI is disabled, as all datsource definitions are Puppetized in our installation.

The thing to do is to edit modules/profile/files/grafana/production-datasources.yaml in the Puppet repo, following https://grafana.com/docs/administration/provisioning/#datasources

I'm happy to review and deploy the patch.

We'll want to make to enable the "Proxy always" option in Grafana for this source (like we do for Prometheus already), because it's hosted externally and we don't want *.wikimedia.org views to connect with external services from the user's browser.

Yup, that's part of the datasource definition -- a simple access: proxy stanza

Change 540572 had a related patch set uploaded (by Phedenskog; owner: Phedenskog):
[operations/puppet@production] Grafana: Add external Graphite for synthetic testing

https://gerrit.wikimedia.org/r/540572

Thank you @CDanis . I need to open up traffic for the Graphite server security group on AWS, what would be the correct IP to open for? Is that enough security or should I add something more?

Sorry, I just realized that this won't work as-is; we don't allow outbound HTTP from production, except via a few proxy servers.

The only support Grafana has for using an HTTP proxy is by setting the HTTP_PROXY environment variable. I see a couple options:

  • In our Puppetization of grafana's systemd unit, export HTTP_PROXY to webproxy.${::site}.wmnet:8080 set NO_PROXY to all of the internal hosts.
  • Set up a special Apache or nginx on the grafana machine to serve as a proxy to this service, and tell grafana to hit that host:port on localhost.

@fgiunchedi do you have any thoughts or opinions?

Sorry, I just realized that this won't work as-is; we don't allow outbound HTTP from production, except via a few proxy servers.

The only support Grafana has for using an HTTP proxy is by setting the HTTP_PROXY environment variable. I see a couple options:

  • In our Puppetization of grafana's systemd unit, export HTTP_PROXY to webproxy.${::site}.wmnet:8080 set NO_PROXY to all of the internal hosts.
  • Set up a special Apache or nginx on the grafana machine to serve as a proxy to this service, and tell grafana to hit that host:port on localhost.

@fgiunchedi do you have any thoughts or opinions?

I like the idea of effectively proxying per datasource (Grafana upstream issue) as opposed to HTTP_PROXY + NO_PROXY in Grafana's environment.

My preference would be to try the apache local vhost that proxies out to webproxy first if it isn't too much work and/or complicated, and the latter if the former fails for some reason.

As an aside: moving off statsv/internal graphite is great to see @Peter !

If we could get help to fix that would be super (its hard for me to know how much work it is)? When it's done we can proceed and add automatic performance alerts for all the wikis we test and we also need to redo the dashboards we have but the good thing is that it will give us more insights.

@fgiunchedi @CDanis does anyone in SRE have capacity to help us with setting up the local-proxy-to-webproxy solution this quarter?

Yes, sorry for the delay -- I was out a few days and then some other stuff
took priority. I'll try to get it done by next week.

Change 545934 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/dns@master] wikimedia.org: add performance-graphite

https://gerrit.wikimedia.org/r/545934

Change 545934 merged by Filippo Giunchedi:
[operations/dns@master] wmftest.org: add wpt-graphite

https://gerrit.wikimedia.org/r/545934

Change 547030 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] grafana: double-proxy for wpt-graphite

https://gerrit.wikimedia.org/r/547030

Change 547030 merged by CDanis:
[operations/puppet@production] grafana: double-proxy for wpt-graphite

https://gerrit.wikimedia.org/r/547030

Change 540572 merged by CDanis:
[operations/puppet@production] Grafana: Add external Graphite for synthetic testing

https://gerrit.wikimedia.org/r/540572

Okay, I think this is done and working!

Change 547563 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] grafana: set default undef on wpt_graphite_proxy_port

https://gerrit.wikimedia.org/r/547563

Change 547563 merged by Jhedden:
[operations/puppet@production] grafana: set default undef on wpt_graphite_proxy_port

https://gerrit.wikimedia.org/r/547563

DannyS712 subscribed.

[batch] remove patch for review tag from resolved tasks