Page MenuHomePhabricator

"browsertime-alerts" and "webpagereplay" alerts: DatasourceNoData and DatasourceError silenced
Open, Needs TriagePublic

Event Timeline

We’re also seeing DatasourceError alerts on several Wikidata alerts, usually with the message “failed to build query 'A': database is locked”. Is that the same issue?

We’re also seeing DatasourceError alerts on several Wikidata alerts, usually with the message “failed to build query 'A': database is locked”. Is that the same issue?

It is not the same issue. We've filed T345362 for that error message, specifically.

I'm back from vacation since some time, I'll have a look at them. Some of the alerts I think will be moved to the web team, but let me sync with them ASAP.

  1. Normal Speed Index Firefox Desktop ALERT alert

Hmm that was a really old alert that was broken. I've fixed the dashboard and removed that alert for now.

The rest of the alerts "increased transfer size" are correct and we need to understand the root cause. The other alerts could be caused by those, I'll check that tomorrow.

I've increased the time span for all the alerts to make sure it do not hit "no data". Lets wait until I see that it works correctly before closing.

I just got an email alert relating to this today:
[FIRING:1] DatasourceError rweb (qbUqRRb4k "AlertManager","ReadingWeb","cxserver" Readers Web [Desktop] Navigation Timing: First paint (Moving Medians) critical)
https://grafana.wikimedia.org/alerting/grafana/qbUqRRb4k/view

Krinkle renamed this task from Performance team Grafana DatasourceNoData and DatasourceError alerts to "browsertime-alerts" and "webpagereplay" alerts: DatasourceNoData and DatasourceError silenced.Feb 12 2024, 8:41 PM
Krinkle updated the task description. (Show Details)

Note that DatasourceNoData is still regularly received on Graphite-based metric alerts as recently as two days ago.

10 Feb 2024: [FIRING:1] DatasourceError perf (JSAFSCJVk "AlertManager" https://grafana.wikimedia.org/d/000000326/navigation-timing-alerts …)

This suggests to me that, contrary to T317887, these either do not have a single root cause, or that the root cause remains at least partially unresolved.

And Prometheus-based alerts as well, from earlier today:

14 Feb 2024: [FIRING:1] DatasourceNoData mediawiki-platform (SvAKSjJVz "AlertManager" https://grafana.wikimedia.org/d/000000402/resourceloader-alerts 000000026 https://grafana.wikimedia.org/d/000000066/resourceloader "resourceloader INM Satisfaction")
https://grafana.wikimedia.org/alerting/grafana/SvAKSjJVz/view