[23:26] < JJMC89> FYI, getting 503s for grafana.wmcloud.org [23:28] < bd808> JJMC89: I'll see if the why is obvious... [23:32] < bd808> !log metricsinfra grafana.wmcloud.org offline with db connection error. Investigating. [23:35] < bd808> !log metricsinfra metricsinfra-db-1.trove.eqiad1.wikimedia.cloud not responsive to ssh [23:37] < bd808> !log metricsinfra metricsinfra-db-1.trove.eqiad1.wikimedia.cloud restarted via Horizon
[23:41] < bd808> andrewbogott: do you know how to troubleshoot a trove db instance? the metricsinfra-db-1 instance in the metricsinfra project is not talking with the grafana process on metricsinfra-grafana-1.metricsinfra.eqiad1.wikimedia.cloud. Restarting the trove db via horizon didn't seem to do anything useful.
The error recorded by grafana-server on metricsinfra-grafana-1.metricsinfra.eqiad1.wikimedia.cloud is:
2023-02-13T23:41:01.41+0000 lvl=eror msg="failed to determine the status of alerting engine. Enable either legacy or unified alerting explicitly and try again" err="failed to verify if the 'alert' table exists: dial tcp 172.16.3.253:3306: connect: connection refused"