Page MenuHomePhabricator

DatasourceError grafana alerting error message database is locked
Closed, ResolvedPublic

Description

Several alerts configured in Grafana experience DatasourceErrors with the message failed to build query 'A': database is locked somewhat regularly and resolve rapidly.

Which database is locked?
Is this an alert configuration issue or something else entirely?

Event Timeline

colewhite renamed this task from DatasourceError grafana errors to DatasourceError grafana alerting error message database is locked.Aug 31 2023, 2:09 PM

This may be a bug - we should try upgrading Grafana to >= 9.4.7 first.

In our old performance alerts we also get:

Error = [plugin.downstreamError] failed to query data: Post "https://graphite.wikimedia.org/render": net/http: timeout awaiting response headers (Client.Timeout exceeded while awaiting headers)

For those alerts these started the 30th of August and comes and goes (with the DatasourceError).

In our old performance alerts we also get:

Error = [plugin.downstreamError] failed to query data: Post "https://graphite.wikimedia.org/render": net/http: timeout awaiting response headers (Client.Timeout exceeded while awaiting headers)

Probably not related to the database being locked. This appears to be a temporary network problem, load issue on Graphite, or an untuned timeout in Grafana.

For those alerts these started the 30th of August and comes and goes (with the DatasourceError).

Aug 30 was when we removed the silence rule on DatasourceErrors.

Hey there we're getting 2 emails daily relating to this problem. Is there any way to suppress these email alerts when they occur?

Change 955014 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] aptrepo: amend pin to allow grafana 9.4.x

https://gerrit.wikimedia.org/r/955014

9.4.14 is live on grafana-next. Will do some testing there before rolling to production early next week. Reinstalled the silence until we can complete the upgrade.

Change 955014 merged by Cwhite:

[operations/puppet@production] aptrepo: amend pin to allow grafana 9.4.x

https://gerrit.wikimedia.org/r/955014

Mentioned in SAL (#wikimedia-operations) [2023-09-11T21:33:19Z] <cwhite> update grafana to 9.4.14 on grafana1002 T345362

Grafana is updated and silence is removed.

@colewhite I'm getting DatasourceError rweb email alerts. Is that covered by this task or T344961 ?

@colewhite I'm getting DatasourceError rweb email alerts. Is that covered by this task or T344961 ?

T344961 is the right task for that. The Readers Web alerts largely mirror the Performance team alerts.

It's been more than a week and I can see no more instances of this in the logs.

Discovered some more evidence of this in logs this morning. There is another recommendation to enable WAL on the sqlite db (new in Grafana 9.4).

Change 961510 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] profile: enable wal on grafana sqlite db

https://gerrit.wikimedia.org/r/961510

Change 961510 merged by Cwhite:

[operations/puppet@production] profile: enable wal on grafana sqlite db

https://gerrit.wikimedia.org/r/961510

Optimistically resolving now that WAL is enabled. Will watch the logs for new instances.