Page MenuHomePhabricator

Address proximity of service deployments to train deployments problem
Closed, ResolvedPublic

Description

During the post-mortem meeting of a recent ORES related incident [1], the proximity of service deployment to the train deployments was deemed to be adding to the overall confusion when trying to determine the root cause of a problem. This needs to be addressed.

[1]https://wikitech.wikimedia.org/wiki/Incident_documentation/20171120-Ext:ORES

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

So Tuesdays we have an hour break in between Service deploys and the train.

Wednesdays and Thursdays there's also an hour break--but with SWAT in between. I'm inclined to think that should be moved as well.

First, thanks for getting me to get the Deployments gcal back in sync with the wiki schedule (canonical source) ;)

Second, yeah.

As you suggested on IRC, moving Wed's "Morning SWAT" to an hour earlier (at 10am instead of 11) gives us this there.

Still leaves the open issue of Thursday. I've gone off and started re-org'ing the entire thing twice now but I think I should stop this late on a Friday. :)

greg triaged this task as Medium priority.Jan 13 2018, 12:43 AM
greg moved this task from Backlog to In-progress on the Release-Engineering-Team (Kanban) board.

From the Deployments page right now ( https://wikitech.wikimedia.org/w/index.php?title=Deployments&oldid=1780713 ), I can't really see how the proximity was removed for Wednesdays. According to that table the train is planned for 12:00–14:00 PST, and service deploys happen at 13:00–14:00 PST - not only that this is still proximal, it's even overlapping now?!

Services always had that window, nothing changed there. Should we address it? Probably. It hasn't been an issue yet, as service deployers will defer to us if there's a MediaWiki issue, but most (not all) Wednesdays we're done pretty quickly, when there's little to investigate.

Actually Wednesday is the day most likely to go sideways. This bug is poorly titled I think. I believe the main motivation was to remove deploys prior to the train so we have things "quiet" before starting.

Good enough for now.