Debs + ensure => latest is a ticking time bomb when running code that should be the same across the entire cluster, and it blew up in our face this time. Let's switch to something else that does actual deployments.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | yuvipanda | T136162 Investigate Tool Labs webservice outage on 2016-05-25 | |||
Declined | None | T136168 Switch toollabs-webservice to be deployed with an actual deployment mechanism |
Event Timeline
This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment explaining why. If it is still valid, please prioritize it appropriately relative to your other work. If you have any questions, feel free to ask me (Greg Grossmeier).
I just ran into this, specifically the ensure => latest. I didn't realize that by uploading a new deb it would instantly be upgraded.
I *do* think that debs are the ideal deployment mechanism here, but not with ensure => latest. Just => present, combined with a manual cumin command to upgrade it everywhere plus the rebuild of docker images.
An alternative to =>present would be specifying an exact version (even having it as a hiera data object so that it's possible to upgrade a single server or environment). I've found that to be helpful in the past.
This is not a perfect system, but honestly reading the incident I agree with T136168#2328773 that this would not have had any effect.