Page MenuHomePhabricator

Scap logs on Grafana dashboards are broken
Open, Needs TriagePublicBUG REPORT

Description

Several Grafana dashboards use scap logs to show when e.g. train runs happen, which is extremely useful for correlating metrics changes with MediaWiki changes. This happens via the via the Public Logs / Loki data source with a query like {channel="scap"} |~ "rebuilt and synchronized wikiversions files". This has stopped working a while ago (weeks? months? I don't remember when I first noticed).

Event Timeline

I see logs returned using the attached query in Loki, but the log volume panel in Explore complains:

Failed to load log volume for this query
parse error at line 1, col 122: syntax error: unexpected IDENTIFIER

We are running an older version of Loki - perhaps some change has broken things on the grafana side.

colewhite moved this task from Inbox to Watching on the Observability-Logging board.
colewhite subscribed.

The original problem persists after upgrading loki to 2.8.11, but the explore panel is fixed in the new version.

Further digging revealed that this is an upstream grafana bug: https://github.com/grafana/grafana/issues/110265

In short, the annotation toggles no longer work when they are default-off. If the dashboard has them enabled via the dashboard settings (Edit->Settings->Annotations->Deploys->Enabled ☑), the annotations reveal themselves.

We'll move forward with the loki upgrade regardless since all the work has been done to rebuild the package.

Mentioned in SAL (#wikimedia-operations) [2025-10-09T16:33:30Z] <cwhite> upgrade grafana-loki on grafana hosts T406478

Thanks for investigating!

If there are no concerns around overloading Loki, just making these annotations enabled by default would be a fine workaround IMO. They can be a bit slow, but they don't block anything else while they are loading, and Prometheus is itself a bit slow anyway.