Points to cover:
- What happened two weeks ago, that Ext:ORES exacerbated T179156.
Incident report for T181006: https://wikitech.wikimedia.org/wiki/Incident_documentation/20151216-ores Emergency protocols for keeping critical pages such as Special:RecentChanges up even when Ext:ORES fails.
- Ext:ORES and the data flow that makes it fragile: T181831
Ext:ORES and how to fail gracefully to the user while still blowing up the logs to get attention when *appropriate*: T181191 How to maintain latest production rollback SHA-1 even when using tin to deploy to multiple clusters. Document protocol for watching both "client" (MW/Ext:ORES) and server-side errors during deployment. Improvements to ORES beta testing, so we could have reproduced or forseen this bug.T181187, T181168
- Thoughts about how we might be able to canary all the Special pages on each language when making ORES changes that might affect all wikis: T181830
- How to speed up deployment and rollback--currently takes 43 min to push a new version, and NN min to rollback: T181067, T181071
Deployment documentation can be found in https://wikitech.wikimedia.org/wiki/ORES/Deployment