Followup from https://wikitech.wikimedia.org/wiki/Incident_documentation/20200611-sessionstore%2Bkubernetes.
Sessionstore is critical for our infrastructure as it became evident from the sessionstore incident. We should be adopting industry best practices at managing this service, including adopting SLIs/SLOs. It's conceivable that had we SLIs and SLOS (and proper alerting on them) we could have prevented the incident in the first place.
Due to the nature of the service, and our current state of SLI/SLO adoption we can probably stall this for a while, filing task so that it doesn't get forgotten.
kask is stateless in itself, but the sessionstore service, of which kask is 1 component is stateful. This makes it a tad more difficult to set meaningful SLIs and SLOs for it so we should first gain some confidence in other services.