(I'm surprised I can't find a task about this already since we've talked about this forever; if I just duplicated one, let me know.)
As a deployer I want to release new code to a small subset of users across all wiki projects (eg 5%) and gradually increase that, watching for an increase in any bad metric (fatals, warnings, timeouts, page load time, etc), until it 100%. (nb: probably just doing 5%, 10%, then 100% is good enough)
Problems
- Subtle cache poisoning
- deploying directly to a small percentage of wikipedias could cause subtle undetected rendering bugs to poison cache for a long time before anyone notices
- Which MW version should be used to run a maintenance script? Right now that can be determined by mwscript's --wiki argument plus wikiversions.json.
Ideas
- Probably a good idea to deploy smaller batches of commits more frequently
- Changes to weekly branching to make this easier
- Elimination/Bypassing of deploy groups
- deploy to canary servers
- Canary server run newest version for all wikis -- no cache?
- Different canary servers than are used for SWAT
- Would have to not serve prod traffic except for group0 (maybe dedicated group0)?
- Opt-in beta testing to a group of servers via cookie or header
- Automated testing
- Get rid of group0
- Replace with canary servers + tests
- Rollout gradual to group1 when tests pass