Today thanos-query in eqiad paged due to overload. On titan1001 the service recovered itself, however on titan1002 load stayed up until I restarted thanos-query manually.
Unsurprisingly, the root cause is some heavy queries. Note the queries below are the ones going on at the time and for which we returned non-200, not all queries in here are necessarily problematic!
2025-02-05T10:52:58 119 days, 0:00:00 'sum(rate(mediawiki_WikimediaEvents_editResponseTime_seconds_count[1d]) * 60)' (replied 200 to Grafana/9.5.18)
2025-02-05T10:54:36 90 days, 0:00:00 'sum by(wiki, platform, type) (increase(mediawiki_WikimediaEvents_temp_account_creation_throttled_total{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwi
ki|srwiki|zh_yuewiki)", platform=~"(android|commons|desktop|ios|linux/amd64|mobile|unknown|web)"}[1h]))' (replied 503 to Grafana/9.5.18)
2025-02-05T10:54:37 90 days, 0:00:00 'sum by(wiki) (mediawiki_WikimediaEvents_local_temporary_account_ip_viewers_total{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwiki|srwiki|zh_yuewiki)"
})' (replied 503 to Grafana/9.5.18)
2025-02-05T10:54:38 90 days, 0:00:00 '(sum(mediawiki_WikimediaEvents_local_temporary_account_ip_viewers_with_enabled_preference_total{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwiki|srwi
ki|zh_yuewiki)"}) by (wiki)) / ( (sum(mediawiki_WikimediaEvents_local_temporary_account_ip_viewers_total{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwiki|srwiki|zh_yuewiki)"}) by (wiki))
- (sum(mediawiki_WikimediaEvents_locally_auto_enrolled_temporary_account_ip_viewers_total{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwiki|srwiki|zh_yuewiki)"}) by (wiki) ))' (replied 503
to Grafana/9.5.18)
2025-02-05T10:54:38 90 days, 0:00:00 'mediawiki_WikimediaEvents_global_temporary_account_ip_viewers_total' (replied 503 to Grafana/9.5.18)
2025-02-05T10:54:39 90 days, 0:00:00 'mediawiki_WikimediaEvents_global_temporary_account_ip_viewers_with_enabled_preference_total' (replied 503 to Grafana/9.5.18)
2025-02-05T10:54:39 90 days, 0:00:00 'sum by(wiki, user) (increase(mediawiki_WikimediaEvents_block_target_total{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwiki|srwiki|zh_yuewiki)", user=
~"(anon|iprange|normal|temp)"}[1h]))' (replied 503 to Grafana/9.5.18)
2025-02-05T10:54:40 90 days, 0:00:00 'mediawiki_WikimediaEvents_global_temporary_account_ip_viewers_total' (replied 503 to Grafana/9.5.18)
2025-02-05T10:54:40 90 days, 0:00:00 'sum by(user) (increase(mediawiki_WikimediaEvents_global_block_target_total{user=~"(anon|iprange|normal|temp)"}[1h]))' (replied 503 to Grafana/9.5.18)
2025-02-05T10:53:26 119 days, 0:00:00 'max_over_time(sum(rate(mediawiki_WikimediaEvents_editResponseTime_seconds_count[2m] offset -7d) * 60)[1w:2m])' (replied 502 to Grafana/9.5.18)
2025-02-05T10:53:56 119 days, 0:00:00 'min_over_time(sum(rate(mediawiki_WikimediaEvents_editResponseTime_seconds_count[2m] offset -7d) * 60)[1w:2m])' (replied 502 to Grafana/9.5.18)
2025-02-05T10:54:04 1 day, 0:00:00 '(sum(increase(webperf_navigationtiming_responsestart_seconds_bucket{is_oversample="False", mw_context=~"anonymous_mainspace_view", mw_skin=~"minerva", le="0.8"}[24h]))/sum(increase(webperf_navigationtimi
ng_responsestart_seconds_bucket{is_oversample="False", mw_context=~"anonymous_mainspace_view", mw_skin=~"minerva", le="+Inf"}[24h]))) - (sum(increase(webperf_navigationtiming_responsestart_seconds_bucket{is_oversample="False", mw_context=~
"anonymous_mainspace_view", mw_skin=~"minerva", le="0.8"}[24h] offset 7d))/sum(increase(webperf_navigationtiming_responsestart_seconds_bucket{is_oversample="False", mw_context=~"anonymous_mainspace_view", mw_skin=~"minerva", le="+Inf"}[24h
] offset 7d)))' (replied 502 to Grafana/9.5.18)
2025-02-05T10:54:05 0:10:00 '100 * sum(rate(varnish_resourceloader_resp{x_cache=~"(hit|int).*"}[1h])) / sum(rate(varnish_resourceloader_resp[1h]))' (replied 502 to Grafana/9.5.18)
2025-02-05T10:54:06 90 days, 0:00:00 'sum by(user) (increase(mediawiki_WikimediaEvents_editResponseTime_seconds_sum{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwiki|srwiki|zh_yuewiki)", u
ser=~"(anon|iprange|normal|temp)"}[1h]))' (replied 502 to Grafana/9.5.18)
2025-02-05T10:54:06 90 days, 0:00:00 'sum by(wiki, user) (increase(mediawiki_WikimediaEvents_editResponseTime_seconds_sum{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwiki|srwiki|zh_yuewik
i)", user=~"(anon|iprange|normal|temp)"}[1h]))' (replied 502 to Grafana/9.5.18)
2025-02-05T10:54:06 90 days, 0:00:00 'sum by(user, platform) (increase(mediawiki_WikimediaEvents_editResponseTime_seconds_sum{wiki=~"(cswikiversity|dawiki|fawiktionary|igwiki|itwikiquote|jawikibooks|nowiki|rowiki|scwiki|shwiki|srwiki|zh_yu
ewiki)", user=~"(anon|iprange|normal|temp)", platform=~"(android|commons|desktop|ios|linux/amd64|mobile|unknown|web)"}[1h]))' (replied 502 to Grafana/9.5.18)
2025-02-05T10:54:06 0:05:00 'histogram_quantile(0.5, sum by (le) (rate(cpjobqueue_normal_rule_processing_delay_bucket{rule=~".*DispatchChanges$",rule!~".*-partitioner-mediawiki-job-.*", service="cpjobqueue"}[15m])))' (replied 502 to Grafan
a/9.5.18)