Over here in the Release-Engineering-Team, Train Deployment is usually a rotating duty. We've written about it before, so I won't go into the exact process, but I want to tell you something new about it.
It's awful, incredibly stressful, and a bit lonely.
And last week we ran an experiment where we endeavored to perform the full train cycle four times in a single week... What is wrong with us? (Okay. I need to own this. It was technically my idea.) So what is wrong with me? Why did I wish this on my team? Why did everyone agree to it?
First I think it's important to portray (and perhaps with a little more color) how terrible running the train can be.
How it usually feels to run a Train Deployment and why
Here's a little chugga-choo with a captain and a crew. Would the llama like a ride? Llama Llama tries to hide.
―Llama Llama, Llama Llama Misses Mama
At the outset of many a week I have wondered why, when the kids are safely in childcare and I'm finally in a quiet house well fed and preparing a nice hot shower to not frantically use but actually enjoy, my shoulder is cramping and there's a strange buzzing ballooning in my abdomen.
Am I getting sick? Did I forget something? This should be nice. Why can't I have nice things? Why... Oh. Yes. Right. I'm on train this week.
Train begins in the body before it terrorizes the mind, and I'm not the only one who feels that way.
A week of periodic drudgery which at any moment threatens to tip into the realm of waking nightmare.
―Stoic yet Hapless Conductor
Aptly put. The nightmare is anything from a tiny visual regression to taking some of the largest sites on the Internet down completely.
Giving a presentation but you have no idea what the slides are.
―Bravely Befuddled Conductor
Yes. There's no visibility into what we are deploying. It's a week's worth of changes, other teams' changes, changes from teams with different workflows and development cycles, all touching hundreds of different codebases. The changes have gone through review, they've been hammered by automated tests, and yet we are still too far removed from them to understand what might happen when they're exposed to real world conditions.
It's like throwing a penny into a well, a well of snakes, bureaucratic snakes that hate pennies, and they start shouting at you to fill out oddly specific sounding forms of which you have none.
―Lost Soul been 'round these parts
When under the stress and threat of the aforementioned nightmare, it's difficult to think straight. But we have to. We have to parse and investigate intricate stack traces, run git blames on the deployment server, navigate our bug reporting forms and try to recall which teams are responsible for which parts of the aggregate MediaWiki codebase we've put together which itself is highly specific to WMF's production installation and really only becomes that long after changes merge to main branches of the constituent codebases.
We have to exercise clear judgement and make decisive calls of whether to rollback partially (previous group) or completely (all groups to previous version). We may have to halt everything and start hollering in IRC, Slack channels, mailing lists, to get the signal to the right folks (wonderful and gracious folks) that no more code changes will be deployed until what we're seeing is dealt with. We have to play the bad guys and gals to get the train back on track.
Trainsperiments Week and what was different about it
Study after study shows that having a good support network constitutes the single most powerful protection against becoming traumatized. Safety and terror are incompatible. When we are terrified, nothing calms us down like a reassuring voice or the firm embrace of someone we trust.
―Bessel Van Der Kolk, M.D., The Body Keeps the Score
Four trains in a single week and everyone in Release Engineering is onboard. What could possibly be better about that?
Well there is a safety in numbers as they say, and not in some Darwinistic way where most of us will be picked off by the train demons and the others will somehow take solace in their incidental fitness, but in a way where we are mutually trusting, supportive, and feeling collectively resourced enough to do the needful with aplomb.
So we set up video meetings for all scheduled deployment windows, had synchronous hand offs between our European colleagues and our North American ones. We welcomed folks from other teams into our deployments to show them the good, the bad, and the ugly of how their code gets its final send off 'round the bend and into the setting hot fusion reaction that is production. We found and fixed longstanding and mysterious bugs in our tooling. We deployed four full trains in a single week.
And it felt markedly different.
One of those barn raising projects you read about where everybody pushes the walls up en masse.
―Our Stoic Now Softened but Still Sardonic Conductor
Yes! Lonely and unwitnessed work is de facto drudgery. Toiling safely together we have a greater chance at staving off the stress and really feeling the accomplishment.
Giving a presentation with your friends and everyone contributes one slide.
―Our No Longer Befuddled but Simply Brave Conductor
Many hands make light work!
It was like throwing a handful of pennies into a well, a well of snakes, still bureaucratic and shouty, oh hey but my friends are here and they remind me these are just stack traces, words on a screen, and my friends happen to be great at filling out forms.
―Our Once Lost Now Found Conductor
When no one person is overwhelmed or unsafe, we all think and act more clearly.
The hidden takeaways of Trainsperiment Week
So how should what we've learned during our Trainsperiment Week inform our future deployment strategies and process. How should train deployments change?
The known hypothesis we wanted to test by performing this experiment was in essence:
- More frequent deployments will result in fewer changes being deployed each time.
- Fewer changes on average means the deployment is less likely to fail. The deployment is safer.
- A safer deployment can be performed more frequently. (Positive feedback loop to #1.)
- Overall we will: move faster; break less.
I don't know if we've proved that yet but we got an inkling that yes, the smaller subsequent deployments of the week did seem to go more smoothly. One week, however, even a week of four deployment cycles is not a large enough sample to say definitively whether doing train more frequently will for sure result in safer, more frequent deployments with fewer failures.
What was not apparent until we did our retrospective, however, is that it simply felt easier to do deployments together. It was still a kind of drudgery, but it was not abjectly terrible.
My personal takeaway is that a conductor who feels resourced and safe is the basis for all other improvements to the deployment process, and I want conductors to not only have tooling that works reliably with actionable logging at their disposal, but to feel a sense of community there with them when they're pushing the buttons. I want them to feel that the hard calls of whether or not to halt everything and rollback are not just their calls but shared in the moment among numerous people with intimate knowledge of the overall MediaWiki software ecosystem.
Better tooling—particularly around error reporting and escalation—is a barrier to entry for sure. Once we've made sufficient improvements there we need to get that tooling into other people's hands and show them that this process does not have to be so terrifying. And I think we're on the right track here with increased frequency and smaller sets of changes, but we can't lose sight of the human/social element and foundational basis of safety.
More than anything else, I want wider participation in the train deployment process by engineers in the entire organization along with volunteers.
Thanks to @thcipriani for reading my drafts and unblocking me from myself a number of times. Thanks to @jeena and @brennen for the inspirational analogies.