In T403729 some questions were raised about visualization issues of Pyrra dashboards. The SLO WG decided to test/evaluate Sloth to compare results and decide what is the best tool for the job.
Some high level things to test:
- Remaining error budget graphs and their behavior.
- Alerting T409310: Sloth: onboard subset of existing SLOs to pilot
- Support for a fixed 3 month calendar window T409312: Sloth: adapt default month view to quarter view (pilot)
- Support for a rolling window view
Sloth doesn't provide a UI like Pyrra, but it relies completely on Grafana. The minimal POC should be something like:
- 1) We build the latest version of sloth (without proper debian packaging, it is not needed now).
- 2) Run sloth (the CLI) with a Yaml spec representing one of the current Pyrra-configured SLOs, like Citoid availability. The output will be another Yaml file, containing the list of recording rules that Thanos will have to ingest.
- 3) Instruct Thanos to pick up the yaml file (bonus: the new metrics should have a label to identify sloth-related time series, for better management).
- 4) Optional (but it would be good): Backfill a couple of month of data for the editcheck pilot recording rules.
- 5) Import the Sloth's Grafana JSON and visualize data.