Page MenuHomePhabricator

[Epic] Each AQS 2.0 Service must have a clearly defined SLO with a process for ongoing ownership
Open, MediumPublic32 Estimated Story Points

Description

Background/Goal

For each AQS 2.0 service, we need to develop an SLO based on the SLO Runbook provided by SRE. The SLOs may have a great deal of overlap and we should keep in mind that there may be different ways to structure and interrelate the SLOs that make them more straightforward to manage.

As well as defining the actual SLO(s), we will also need to define and kickoff an ongoing ownership process for AQS 2.0 that utilises the SLO(s).

KR/Hypothesis(Initiative)

n/a

Success Criteria

  • SLO(s) governing all AQS 2.0 services are defined that articulate the expected level of service for each
  • Stakeholders/users/consumers have approved of the service levels proposed
  • There is a process in place that ensures the Data Products team is able to maintain the expected standard for each service
  • It is straightforward to add a new SLO for a new or existing service

Success metrics

  • All 6 "original" AQS 2.0 Services have SLOs
  • At least one new AQS 2.0 Service has an SLO
  • Any additional new services that do not have an SLO have tasks defined for that work and are in the backlog

In scope

  • All AQS 2.0 Services

Out of Scope

  • Anything that is not directly within the control of Data Products team and does not relate directly to AQS 2.0 service functioning(this can be broad as AQS 2.0 relies on k8s, for example)

Artifacts & Resources

Event Timeline

WDoranWMF triaged this task as Medium priority.Oct 3 2023, 3:33 PM
WDoranWMF created this task.

I think having one main SLO and subsections per service (perhaps addressing AQS2 (or just "AQS" I guess once the old services are gone) as a platform?) would be a good idea - there might be quite a lot of indicators if we need to set them per-service, but I think the service behaviours are so very similar that it still makes sense. It could make for easier evaluation and also save a bit of time - starting broad and then if needed getting more specific will probably be more efficient than the other way around.