Page MenuHomePhabricator

Improve tooling for long-running Thanos queries
Open, HighPublic

Description

In our documentation for thanos-query failing probes, we recommend running thanos-query-log-explore against the apache log. This command currently returns without printing anything in many cases - are long-running/expiring queries ending up in the relevant log, and if they do are we parsing them correctly? A few outages have shown no output.

The grafana logs rather than the apache logs seem to show these failing queries more readily, perhaps we need to adjust the tool or write a new one to parse these logs.

Longer-term, we need to centralise these logs to be able to dashboard them and filter them much better.

Event Timeline

hnowlan renamed this task from Recommended use of `thanos-query-log-explore` in Thanos docs returns nothing to Improve tooling for long-running Thanos queries.Jan 14 2026, 4:30 PM
hnowlan triaged this task as High priority.
hnowlan updated the task description. (Show Details)

For context, the outage was caused by saturated nics on the titan hosts.