Page MenuHomePhabricator

Define SLOs and error budget for WDQS
Closed, DeclinedPublic


Following the 2020-07-23 WDQS Outage, it was recognized that given WDQS exposes a public endpoint with the potential for overly expensive queries to compromise service availability by knocking WDQS instances offline, we need to create SLOs and an error budget accordingly and publicize it .

  • SLOs defined and reviewed
  • Error budget defined and reviewed by broader SRE team

Event Timeline

Gehel updated the task description. (Show Details)
Gehel triaged this task as Medium priority.Sep 8 2020, 7:13 PM

Note that a similar discussion was already started on T199228. It was closed, waiting for architectural changes to be implemented first.

Will this ticket be resolved by the dashboard built for and a final SLO value we settle on?

Ryan will add a quick blurb in about the SLO with a link to the dashboard, at which point we can move this to Needs Reporting.

Also we (Ryan/Brian) should mention the SLO in our next SRE meeting.

Gehel moved this task from Analysis to Incoming on the Wikidata-Query-Service board.

Superseeded by T313751