Page MenuHomePhabricator

moving from krypton to grafana1001 broke fundraising dashboards
Closed, ResolvedPublic

Description

It appears likely that grafana1001.eqiad.wmnet is missing some firewall rules somewhere (I am yet to figure out where) that make it able to talk to pay-lvs1001.frack.eqiad.wmnet and pay-lvs2001.frack.eqiad.wmnet.

These are datasources used in a few consoles e.g. https://grafana.wikimedia.org/d/000000403/fundraising-database?orgId=1 which are now broken.

@cwdent indicated that it's preferred to not make firewall changes while the fundraiser is running, so instead, for the interim, I will set up a new endpoint https://grafana-old.wikimedia.org/ and then take it down once this issue is fixed properly.

Event Timeline

Change 479028 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] add hiera for grafana-old.w.o pointing to krypton

https://gerrit.wikimedia.org/r/479028

Change 479028 merged by CDanis:
[operations/puppet@production] add hiera for grafana-old.w.o pointing to krypton

https://gerrit.wikimedia.org/r/479028

Change 479029 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/dns@master] grafana-old.wikimedia.org DNS points to text caches

https://gerrit.wikimedia.org/r/479029

Change 479029 merged by CDanis:
[operations/dns@master] grafana-old.wikimedia.org DNS points to text caches

https://gerrit.wikimedia.org/r/479029

krypton.eqiad.wmnet is serving again: https://grafana-old.wikimedia.org/dashboard/db/fundraising-database?orgId=1

Presently grafana-old's database is read-only, so edits to dashboards can't be made. This was done as part of the switchover, to give a clean window where we would fail on dashboard edits instead of silently dropping them. If read-only is a problem I can adjust that, just do note that any changes made on grafana-old won't be migrated anywhere afterwards.

I'll take grafana-old back down once we fix the firewall rules.

CDanis updated the task description. (Show Details)

When you're ready to make firewall changes, I'm happy to help :)

Deployed iptables change:

7c8dce3 new grafana server, grafana1001.eqiad.wmnet

@CDanis thanks so much for the temporary solution. We have separate firewalls for the fundraising cluster where I'll make the corresponding config change shortly, and defer to netops to deploy.

Great, thanks! Please reassign back to me once the changes are deployed, to track tearing down the temporary solution.

@ayounsi the updated config for this is at 1546888529

Mentioned in SAL (#wikimedia-operations) [2019-01-07T19:42:56Z] <XioNoX> push firewall change to pfw3-codfw/eqiad - T211712

@ayounsi thanks! Fundraising grafana is now fixed. i pushed up 1546890827 which removes krypton.

@CDanis all good here, go ahead and remove the old service at your convenience.

Change 482818 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/dns@master] Revert "grafana-old.wikimedia.org DNS points to text caches"

https://gerrit.wikimedia.org/r/482818

Change 482818 merged by CDanis:
[operations/dns@master] Revert "grafana-old.wikimedia.org DNS points to text caches"

https://gerrit.wikimedia.org/r/482818

Change 482819 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] Revert "add hiera for grafana-old.w.o pointing to krypton"

https://gerrit.wikimedia.org/r/482819

Change 482819 merged by CDanis:
[operations/puppet@production] Revert "add hiera for grafana-old.w.o pointing to krypton"

https://gerrit.wikimedia.org/r/482819

grafana-old.wikimedia.org is no more.