Page MenuHomePhabricator

Canaries canaries canaries
Open, MediumPublic

Description

We are investigating how we extend the canary functionalities we already have. Our goal is to catch errors and issues early enough so they don't affect the majority of our users. The problem with our current processes is that we deploy changes first to low traffic wikis which is ok for some errors, but not enough to catch issues that surface at certain amounts of traffic.

What we have already in place is:

  1. During deployment scap deploys to some servers and monitors error rates on log files, waits for a few seconds, and then deploys to all related servers
  2. We are able to deploy changes only to mwdebug* servers, and test them by routing specific traffic towards them via our chrome/firefox extension

Those processes can become more efficient by:

  • Deploy changes to affect a pre-specified amount of traffic, and increase this amount in stages i.e. start with X% and rump it up all the way to 100%.
  • Deploy changes to affect only specific groups of users e.g. beta users, or logged in users

Related: T213156

Event Timeline

jijiki triaged this task as Medium priority.Nov 22 2018, 1:35 PM
jijiki added a project: Scap.
jijiki added subscribers: Dzahn, akosiaris, mark and 9 others.

(This seems like a high level/meta task, right? And since there is already scap-specific tasks as subtasks, removing Scap from this task.)