In general, deployment documentation right now is a mess. Several large pages are redundant with one another and slightly out of sync, navigation is difficult, and important details of policy are hard to find. We should consolidate a number of pages under a more coherent structure, make sure everything actually reflects current practice, and improve the navigation aids. This applies to the procedural train docs as well as to descriptions of how deployments are structured overall and how backports are to be conducted.
Things that need tweaked for recent policy changes:Several large pages are redundant with one another and slightly out of sync, navigation is difficult, and important details of policy are hard to find. There's also not really a single clear entry point for new deployers.
We should consolidate a number of pages under a more coherent structure, make sure everything actually reflects current practice, and improve the navigation aids. This applies to the procedural train docs as well as to descriptions of how deployments are structured overall and how backports are to be conducted.
= Structural improvements and onboarding =
We want to get more people confident deploying backports, as well as aware of the ways they are affected by the train process. To that end:
- There should probably be an overall /Deployments portal, replacing the current calendar location
- All the deployment docs should actually live under /Deployments
- Calendar should probably move to /Deployments/Calendar
- Projects that reference /Deployments will need updating:
- [ ] Jouncebot parses Deployments
- [ ] Do a codesearch for other stuff, ask around
- There should be a /Deployments/Training entrypoint for new folks
- Convey the who / what / when / where
- Current content of https://wikitech.wikimedia.org/wiki/Backport_windows#New_Backport_Team_member_check-list
- We should establish a clear training process.
- Open to anyone who:
- Is in NDA / WMF / WMDE LDAP groups.
- Has shell access.
- Has received log triage training. (Details here could be worked out, but knowing how to deal with logs needs to be part of knowing how to deploy.)
- Put this on the staff calendar, and offer invites: "Message me your email associated with your LDAP and I'll add you to the invite."
- Trainer will check that people meet requirements.
= Policy change tweaks =
- [[https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train|Holding the train]]
- [X] Mention client errors and 1k limit in a 12 hour period before it's an UBN
- [ - [X] Client errors < 100 / hour
- [ ] Specific error budget - 2 or more times in a version?
- [ ] Define "new" in regards to errors
- [[https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys|Heterogeneous deployment/Train_deploys]]
- [X] Mention client error dashboard
- [ ] Client errors < 100 / hour
- [ ] Define "new" in regards to errors
cc: @thcipriani, @dancy if there are specifics I'm forgetting here.