Page MenuHomePhabricator

Upgrade cloudvirt1019 and cloudvirt1020 to Buster
Closed, ResolvedPublic

Description

Cloudvirt1019 and cloudvirt1020 host:

clouddb1001.clouddb-services.eqiad1.wikimedia.cloud
clouddb1002.clouddb-services.eqiad1.wikimedia.cloud
clouddb1003.clouddb-services.eqiad1.wikimedia.cloud
clouddb1004.clouddb-services.eqiad1.wikimedia.cloud

We do not have immediate plans to move these to Ceph, which means that upgrading them will require a fair bit of downtiming/failover steps to keep toolsdb users happy.

I'm going to experiment with rebuilding hypervisors in a way that leaves their storage partition intact; in the meantime @Bstorm will figure out how to manage the downtime.

Event Timeline

I'd like to have this done by ~ the third week of October; by that time all the other upgrades should be unblocked.

Based on initial scribbling of plans in the subtask, I'd like to request we do cloudvirt1020 first. That way, when we failover clouddb1001 (which is on cloudvirt1019) to clouddb1002 (cloudvirt1020), it doesn't have to fail back, saving ToolsDB downtime.

Change 630708 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Further attempt to reimage cloudvirts while preserving /srv

https://gerrit.wikimedia.org/r/630708

Change 630708 merged by Andrew Bogott:
[operations/puppet@production] Further attempt to reimage cloudvirts while preserving /srv

https://gerrit.wikimedia.org/r/630708

Change 630954 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Update partman recipes for cloudvirts

https://gerrit.wikimedia.org/r/630954

Change 630954 merged by Andrew Bogott:
[operations/puppet@production] Update partman recipes for cloudvirts

https://gerrit.wikimedia.org/r/630954

Change 635023 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1019/1020: Move to Buster on next reimage

https://gerrit.wikimedia.org/r/635023

Change 635023 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1019/1020: Move to Buster on next reimage

https://gerrit.wikimedia.org/r/635023

Mentioned in SAL (#wikimedia-cloud) [2020-10-20T17:07:27Z] <bstorm> shut down replication on clouddb1002 (now with task) T263677

Mentioned in SAL (#wikimedia-cloud) [2020-10-20T17:08:13Z] <bstorm> stopping mariadb on clouddb1002 T263677

Mentioned in SAL (#wikimedia-cloud) [2020-10-20T17:13:43Z] <bstorm> stopping postgresql on clouddb1003 T263677

Mentioned in SAL (#wikimedia-cloud) [2020-10-20T17:14:15Z] <bstorm> shutting down clouddb1003 T263677

Change 635325 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1020: update nic names for Buster

https://gerrit.wikimedia.org/r/635325

Change 635325 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1020: update nic names for Buster

https://gerrit.wikimedia.org/r/635325

Change 635346 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Revert "cloudvirt1020: update nic names for Buster"

https://gerrit.wikimedia.org/r/635346

Change 635346 merged by Andrew Bogott:
[operations/puppet@production] Revert "cloudvirt1020: update nic names for Buster"

https://gerrit.wikimedia.org/r/635346

cloudvirt1020 is now running Buster.

Mentioned in SAL (#wikimedia-cloud) [2020-10-20T18:36:34Z] <bstorm> brought up mariadb and replication on clouddb1002 T263677

Andrew claimed this task.

Cloudvirt1019 was upgraded yesterday.