Page MenuHomePhabricator

Avoid unfinished train deploys over holidays, weekends, or other stretches of no-deploy days
Open, Needs TriagePublic

Description

Motivation

What went poorly? […]

  • Patches from today's codebase had to be ported to the master state of 2 weeks ago and 3 weeks ago, because production was left in a multi-version state over both a weekend and a full no-deploy week. Managing multple versions is inevitable to some extent, but in general Tue-Wed-Thu already feels like a long enough time to juggle two branches for, nevermind 3+ weeks.

Proposal

To adopt a policy mandating us to not leave the train paused mid-way over one or more no-deploy days, such as holidays, weekends and other days/weeks of no-deploy time.

By Friday (assuming a regular work week) we must either roll forward or roll back. Weekend incident investigation should never have to deal with a multi-version deployment. Even if the train is not blocked and there simply wasn't time to roll out completely, either rollback or roll forward.

Event Timeline

Krinkle created this task.Aug 14 2020, 2:28 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 14 2020, 2:28 AM

Do you mean that if by Friday, we haven't gotten to group2, we roll back all groups to the previous train version? And start over from testwikis and group0 on Monday?

Do you mean that if by Friday, we haven't gotten to group2, we roll back all groups to the previous train version? And start over from testwikis and group0 on Monday?

Yes. Or even straight back to group1 on Monday, or to balance the risk and move to group2 on Thursday/Friday after all. As long as it doesn't leave it split over any no-deploy days.

Krinkle renamed this task from Adopt policy to not leave train deploys unfinished over holidays, weekends, or other stretches of no-deploy days. to Avoid unfinished train deploys over holidays, weekends, or other stretches of no-deploy days.Sep 10 2020, 3:53 PM

Having pondered this during last week, when I was the train conductor, I have no strong opinions for or against.

I'm happy to leave this for others to decide.

brennen moved this task from Backlog to Watching on the User-brennen board.
brennen added a subscriber: brennen.
Michael added a subscriber: Michael.
Tarrow added a subscriber: Tarrow.Oct 30 2020, 10:43 AM

Sounds like a good idea to me. The only possible hiccup might be that if we end up rolling everything back more regularly we need to be more mindful of any config changes that also need to be reverted.

Particularly if the change was targeting a group0 wiki early in the week and so by Friday the need to revert it is not at the fore of anyone's mind.

Urbanecm added a subscriber: Urbanecm.
Joe added a subscriber: Joe.Wed, Nov 4, 2:38 PM

I think that while we should try to avoid such a situation, mandating we either roll forward or back by policy would just be removing the ability for people managing releases to make a judgement call, which is almost never a good idea. And this is not counting that in some cases, rolling back after days might be slightly problematic.

So: I agree with making this a general recommendation, but I strongly oppose making this a mandatory policy.

I agree with Joe. In most of the cases (?), it's not a good idea, and needs
constant attention, but we should allow releng to keep it that way in some
cases, if warranted for any reason.

Tgr added a subscriber: Tgr.Thu, Nov 5, 10:46 PM

If the new version is already on group1, the churn would be somewhat disruptive for those wikis.