Page MenuHomePhabricator

Switch toollabs-webservice to be deployed with an actual deployment mechanism
Closed, DeclinedPublic


Debs + ensure => latest is a ticking time bomb when running code that should be the same across the entire cluster, and it blew up in our face this time. Let's switch to something else that does actual deployments.

Event Timeline

I don't think this deserves major action, and T136168 can't prevent it. If software needs to be updated on multiple hosts, regardless of any deployment method, there will be a time frame where host A will run version X and host B will run version X + 1. Therefore whether it's MediaWiki or some other application, good practice is to make version X + 1 backwards-compatible with version X, deploy X + 1, and once that is running everywhere, the compatibility mode can be disabled/dropped aka new features can be used.

We've used that process in the past, here we missed it once, so let's make a mental note and move on.

This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment explaining why. If it is still valid, please prioritize it appropriately relative to your other work. If you have any questions, feel free to ask me (Greg Grossmeier).

I just ran into this, specifically the ensure => latest. I didn't realize that by uploading a new deb it would instantly be upgraded.

I *do* think that debs are the ideal deployment mechanism here, but not with ensure => latest. Just => present, combined with a manual cumin command to upgrade it everywhere plus the rebuild of docker images.

An alternative to =>present would be specifying an exact version (even having it as a hiera data object so that it's possible to upgrade a single server or environment). I've found that to be helpful in the past.

JJMC89 edited projects, added Cloud-Services; removed Toolforge.
JJMC89 edited projects, added Toolforge; removed Cloud-Services.
bd808 added a subscriber: bd808.

This is not a perfect system, but honestly reading the incident I agree with T136168#2328773 that this would not have had any effect.