Page MenuHomePhabricator

Bugfix: Prometheus serving old data in the absence of new data, to Grafana
Closed, ResolvedPublic

Description

Grafana looks to be intermittently displaying misleading data as a result of Prometheus serving old metric data by default, in the absence of new data.

This bug is visible within the grafana dash "donations queued to civicrm", which is displaying PayPal donation messages for new data points which there have actually been no paypal donations.

To observe yourself:
Jump on to civi1001, and in your terminal run watch -n1 cat /var/spool/prometheus/donations.prom while watching in another window https://grafana.wikimedia.org/dashboard/db/fundraising-overview?orgId=1&from=now-15m&to=now&panelId=15&fullscreen&refresh=1m and compare the two sources

Event Timeline

jgleeson updated the task description. (Show Details)

@jgleeson is right, wonky behavior, looks like Prometheus doesn't deal well with a highly variant list of metrics: https://prometheus.io/docs/practices/instrumentation/#avoid-missing-metrics

Apparently best practice is to export a zero for an empty queue

Change 403305 had a related patch set uploaded (by Jgleeson; owner: Jgleeson):
[wikimedia/fundraising/crm@master] T183275 WIP. Implementation working but not tests..due to flaky DonationStats API.

https://gerrit.wikimedia.org/r/403305