As an ML engineer,
I want to review the current status of the Tone Check latency SLO defined in T390706: Create SLO dashboard for tone (peacock) check model, so that I can ensure it accurately reflects user experience and re-configure it to provide actionable insights for improving performance.
As defined in the Tone Check model SLO we have the following latency SLO:
Latency SLO, acceptable fraction: 90% of all successful requests (2xx) that complete within 1000 milliseconds, measured at the server side.
The previous SLO quarter ended on 31st of August (20250601-20250831) in which we failed to reach to the latency target. Instead of the targeted 90%, only 79% of requests were completed in <1second as shown in the Pyrra Dashboard.
As part of this task we would like to:
- Investigate as much as possible the reasons behind the increased latencies compared to the initial load tests. One issue that we have spotted is that latencies reported by Istio (and istio sidecar) are totally different than these reported by the kserve inference pods. The latter shows really low latencies while the istio metrics are more close to what is actually happening in reality.
- Revisit both the SLI definition as well as the SLO and decide whether we should update one of them or both.
The initial SLI that we had defined was:
Latency SLI, acceptable fraction: The percentage of all successful requests (2xx) that complete within 1000 milliseconds (1 sec), measured at the server side.
In redefining the latency SLO we have the following options:
- change the SLI: increase the milliseconds that are defined in the SLI
- change its SLO definition and decrease the target from the initial 90%.
- change both of the above









