Page MenuHomePhabricator

Use Grizzly for Varnish SLO Grafana dashboard
Closed, ResolvedPublic

Description

The Varnish SLO dashboard drafted in the context of T284576 was created manually on Grafana. The SRE Observability team is currently working on Grafana Grizzly, which allows to define Grafana dashboards as code.

We want to port the Varnish SLO dashboard to Grizzly.

Event Timeline

ema triaged this task as Medium priority.Aug 17 2021, 8:04 AM

Change 713440 had a related patch set uploaded (by Ema; author: Ema):

[operations/grafana-grizzly@master] Add Varnish SLO dashboard

https://gerrit.wikimedia.org/r/713440

Change 713440 merged by Ema:

[operations/grafana-grizzly@master] Add Varnish SLO dashboard

https://gerrit.wikimedia.org/r/713440

@herron: I've merged the patch, forced a puppet run on grafana1002.eqiad.wmnet, and followed the instructions at https://wikitech.wikimedia.org/wiki/Grafana.wikimedia.org#Grizzly.

The diff step, grr diff dashboardname, is unclear to me. What is dashboardname? I thought maybe it's the key of the grafanaDashboards hash, in our case varnish-tmpl.json, but that doesn't seem to work:

07:41:35 ema@grafana1002.eqiad.wmnet:/srv/grafana-grizzly
$ grr diff varnish-tmpl.json
RUNTIME ERROR: couldn't open import "varnish-tmpl.json": no match locally or in the Jsonnet library paths
        varnish-tmpl.json:2:17-43       thunk <src> from <$>
        varnish-tmpl.json:3:5-8 $
        During evaluation

The usage information for grr diff says grr diff <jsonnet-file> [flags], so I presume the argument is not a dashboard name but the JSON file where we define dahsboards. That does seem like a step in the right direction, but the command says that slo-varnish-tmpl is not present in Dashboard:

07:44:24 ema@grafana1002.eqiad.wmnet:/srv/grafana-grizzly
$ grr diff slo_dashboards.jsonnet
Dashboard/slo-logstash no differences
Dashboard/slo-varnish-tmpl not present in Dashboard
Dashboard/slo-apigw no differences
Dashboard/slo-etcd-tmpl no differences
Dashboard/slo-etcd no differences
Dashboard/slo-logstash-tmpl no differences

The next step in the wikitech instructions suggests to run grr apply dashboardname, but I'll wait for you to take a look before doing so. Thanks!

Thanks @ema! This is helpful feedback

The diff step, grr diff dashboardname, is unclear to me. What is dashboardname?

This would be the jsonnet file to render, in our case slo_dashboards.jsonnet. I've updated the wikitech docs to clarify that this expects a jsonnet file.

The usage information for grr diff says grr diff <jsonnet-file> [flags], so I presume the argument is not a dashboard name but the JSON file where we define dahsboards. That does seem like a step in the right direction, but the command says that slo-varnish-tmpl is not present in Dashboard:

07:44:24 ema@grafana1002.eqiad.wmnet:/srv/grafana-grizzly
$ grr diff slo_dashboards.jsonnet
Dashboard/slo-logstash no differences
Dashboard/slo-varnish-tmpl not present in Dashboard
Dashboard/slo-apigw no differences
Dashboard/slo-etcd-tmpl no differences
Dashboard/slo-etcd no differences
Dashboard/slo-logstash-tmpl no differences

Yes, this is right. I wish this would output the full diff containing the new dashboard, but since it doesn't exist yet in Grafana Grizzly outputs 'not present in Dashboard' with no diff. It could be made more intuitive, I'll open an issue upstream.

Also, I updated the wikitech docs with this information as well as a hint to run 'grr preview' in these cases, which will render the dashboards and upload them as snapshots for review before running apply.

The next step in the wikitech instructions suggests to run grr apply dashboardname, but I'll wait for you to take a look before doing so. Thanks!

Looks good, next step is an optional grr preview slo_dashboards.jsonnet and finally grr apply slo_dashboards.jsonnet

Mentioned in SAL (#wikimedia-operations) [2021-09-01T07:45:13Z] <ema> deploy Varnish SLO dashboard with grr apply slo_dashboards.jsonnet T289036

Also, I updated the wikitech docs with this information as well as a hint to run 'grr preview' in these cases, which will render the dashboards and upload them as snapshots for review before running apply.

Perfect! Thank you.

Looks good, next step is an optional grr preview slo_dashboards.jsonnet and finally grr apply slo_dashboards.jsonnet

Done, that worked! See https://grafana.wikimedia.org/d/slo-varnish-tmpl/varnish-slos-template-draft?orgId=1

There are a few things to improve, for example the cluster dropdown should only list cache_text and cache_upload, while it currently includes clusters such as appserver and bastion which obviously don't make much sense for a Varnish SLO dashboard. Other than that I think we look good, thank you @herron!

Change 717587 had a related patch set uploaded (by Herron; author: Herron):

[operations/grafana-grizzly@master] slo_dashboards: add cluster_label_query and set default

https://gerrit.wikimedia.org/r/717587

Change 717587 merged by Herron:

[operations/grafana-grizzly@master] slo_dashboards: add cluster_label_query and set default

https://gerrit.wikimedia.org/r/717587

the cluster dropdown should only list cache_text and cache_upload, while it currently includes clusters such as appserver and bastion which obviously don't make much sense for a Varnish SLO dashboard. Other than that I think we look good

This has been addressed in 717587 which adds the ability to customize the cluster label query per-dashboard and sets that appropriately for varnish. I think we're in good shape!