@aborrero has been doing a lot of good work to shore up our lacking apt workflow across cloud instances and as a result https://gerrit.wikimedia.org/r/#/c/389480/ was merged (unattended upgrades for WMF packages).
We stepped through changes picked up in Toolforge at the time by using unattended-upgrade to generate reports and disabling puppet with explicit enabling and runs per role. We did pick up a breaking change for elasticsearch and also left behind kernel updates for anything Jessie ( tracked in T180809 ). We have had issues in the past we believe were from host/guest kernel version mismatch for virtio causing IO freezes, and also with unattended upgrades breaking nginx during staff offhours.
As such, we intend to manage updates during our weekly cloud clinic duty process with a set of scripts that generate a report of available updates and additionally apply them. This will be done explicitly and inside of working hours for the majority of cloud admins so we can respond to issues more appropriately in real-time.
Outcomes:
- 1) Ensuring enabled unattended upgrades of 3 variants across cloud now with default application and a hiera setting to disable (except security updates which cannot be disabled):
- Security updates (actually comes with default unattended-upgrades package installation so let's just annotate that to this effect - https://gerrit.wikimedia.org/r/#/c/394080/ (merged)
- Stable-updates upgrade candidates -https://gerrit.wikimedia.org/r/#/c/394200/
- WMF upgrade candidates - https://gerrit.wikimedia.org/r/#/c/389480/ (merged but needs the above as well)
- 2) Making sure unattended upgrades does conservative and sane things on conflict (note https://anonscm.debian.org/cgit/dpkg/dpkg.git/tree/src/configure.c)
- https://gerrit.wikimedia.org/r/#/c/392421/ - (merged)
- 3) Creation of tooling for projects that have chosen not to apply distro and/or wmf upgrade candidates via unattended upgrades (beginnings of this here https://phabricator.wikimedia.org/P6365)
- Can run a report from clush/cumin master within a project to see possible changes (https://gerrit.wikimedia.org/r/#/c/394572/)
- add verbosity flag (https://gerrit.wikimedia.org/r/#/c/398458/)
- Can apply upgrades per host and per source (distro or wmf) (probably simply apt kungfu) (https://gerrit.wikimedia.org/r/398079)
- Can run a report from clush/cumin master within a project to see possible changes (https://gerrit.wikimedia.org/r/#/c/394572/)
- 4) Create documentation for clinic outlining this responsibility (https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Attended_package_upgrades)