Kartotherian is currently monitored by statsd/Graphite; the SLO monitoring infrastructure is pointed at Prometheus. That means that out of the box, we can't create a Kartotherian SLO dashboard. There are a few ways we could resolve that:
We could put Envoy in front of Kartotherian, and use its telemetry for errors and latency instead of Kartotherian's.- We could update the SLO dashboard template to read from Graphite as well as from Prometheus.
- We could use the Prometheus pushgateway to slurp those metrics over from statsd, although it isn't a great fit.
Presently, we're leaning toward Envoy -- we trust its reporting a little more, especially around timeouts, and it would also get us the other usual traffic-management benefits. If that turns out to be more complex than expected, we'll look into one of the other options.
The best option is for us to use the pending native Prometheus support within Kartotherian itself