Page MenuHomePhabricator

Thanos: support multiple ruler instances
Closed, ResolvedPublic

Description

Today we have a single thanos ruler instance, however it would be quite useful to be able to deploy certain recording rules with a unique external label so they are easier to find/manage/cleanup in thanos storage. The immediate use case for this is piloting recording rules using production metrics

Event Timeline

herron changed the task status from Open to In Progress.
herron triaged this task as Medium priority.

Change #1192209 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] thanos-rule: add pilot instance

https://gerrit.wikimedia.org/r/1192209

Change #1188441 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] thanos-rule: add support for multiple instances

https://gerrit.wikimedia.org/r/1188441

Change #1188441 merged by Herron:

[operations/puppet@production] thanos-rule: add support for multiple instances

https://gerrit.wikimedia.org/r/1188441

Change #1197326 had a related patch set uploaded (by Herron; author: Herron):

[operations/alerts@master] ThanosRecordingRuleGaps: update thanos-rule to thanos-rule@main

https://gerrit.wikimedia.org/r/1197326

Change #1197326 merged by jenkins-bot:

[operations/alerts@master] ThanosRecordingRuleGaps: update thanos-rule to thanos-rule@main

https://gerrit.wikimedia.org/r/1197326

Change #1197669 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] profile::thanos::query::store_config: add define

https://gerrit.wikimedia.org/r/1197669

Change #1197669 merged by Herron:

[operations/puppet@production] profile::thanos::query::store_config: add define

https://gerrit.wikimedia.org/r/1197669

Change #1192209 merged by Herron:

[operations/puppet@production] thanos-rule: add pilot instance

https://gerrit.wikimedia.org/r/1192209

We now have two thanos rule instances running, "main" (the pre-existing instance) and a new instance called "pilot"

Each instance is configured to attach a unique external label recorder=thanos-rule@main, recorder=thanos-rule@pilot etc.

titan1001:~$ systemctl status thanos-rule@pilot
● thanos-rule@pilot.service - Thanos rule (instance pilot)
     Loaded: loaded (/lib/systemd/system/thanos-rule@pilot.service; static)
     Active: active (running) since Thu 2025-10-23 20:04:17 UTC; 41min ago

I think we're good to go here!