Page MenuHomePhabricator

Switch default Grafana datasource to Thanos
Closed, ResolvedPublic

Description

ATM the default Grafana datasource is Graphite, which is deprecated. Which should switch to Thanos instead. Doing it the naive way (just changing the configuration) doesn't work and breaks dashboards: T269329: Many Grafana dashboards had their queries nuked. Currently display "no data"..

Outline of steps:

  • Move dashboards with "default" datasource to have "graphite" datasource explicitly
  • Switch default datasource to Thanos

Event Timeline

Dashboards with datasource === null is probably the most affected by this change.

It appears opening a dashboard and pressing the save icon proposes a lot of changes that may address this issue.

Perhaps we can detect which dashboards need a schema migration and script a save operation? dashboard.schemaVersion < 32 might be a helpful filter.

We can also test on grafana-next.

I've updated all dashboards with null panel datasources as of today. At this point the only panels that remain with null datasource are dashboard rows which are expected as they are the row containers for other panels.

Today is Friday, so in the spirit of not introducing sweeping changes before a weekend I'll plan to make a final pass and switch the default datasource to Thanos early next week.

Change #1057882 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] grafana: set thanos as default datasource

https://gerrit.wikimedia.org/r/1057882

Change #1057882 merged by Herron:

[operations/puppet@production] grafana: set thanos as default datasource

https://gerrit.wikimedia.org/r/1057882

The grafana default datasource has been switched to Thanos. If any unexpected issues occur the patch to revert is https://gerrit.wikimedia.org/r/1057882. There's also a sqlite db backup on the grafana filesystem with todays date, just in case.

Mentioned in SAL (#wikimedia-operations) [2024-07-29T14:24:07Z] <herron> the grafana default datasource has been changed from graphite to thanos T269333

Reopening as we're seeing side effects on dashboards that were missed in the audit, e.g. T371520

Change #1057882 merged by Herron:

[operations/puppet@production] grafana: set thanos as default datasource

https://gerrit.wikimedia.org/r/1057882

FTR, this was just reverted due to the issue @fgiunchedi mentioned above.

Change #1057882 merged by Herron:

[operations/puppet@production] grafana: set thanos as default datasource

https://gerrit.wikimedia.org/r/1057882

FTR, this was just reverted due to the issue @fgiunchedi mentioned above.

Thanks for troubleshooting this!

I was out of the office for a couple weeks, but am back now and spent some time experimenting with this in our Grafana staging environment. I was able to reproduce the problem outlined in T371520, and can confirm that the example affected dashboards do work correctly with Thanos as the default after they have been no-op saved for schema updates.

In my testing there are indeed other dashboards with schema version earlier than 36 that make graphite queries and break with default datasource of Thanos. FWIW similar no-op schema update saves were done before switching defaults on the 29th, however that scope was limited to dashboards matching datasource graphite which I now realize left a blind spot in cases where the json didn't have that schema yet.

To help remedy that as of this morning all dashboards that had schema version below 36 have been no-op saved and updated to current schema (38) and with that done they continue working in staging with Thanos as the default datasource.

With that now in place I'll upload a followup patch to switch to Thanos again, aiming for early/mid next week ideally

Change #1069230 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] grafana: set thanos as default datasource

https://gerrit.wikimedia.org/r/1069230

Change #1069230 merged by Herron:

[operations/puppet@production] grafana: set thanos as default datasource

https://gerrit.wikimedia.org/r/1069230

The Grafana default datasource has been switched again to Thanos. If needed, the patch to revert is https://gerrit.wikimedia.org/r/1069230