The current Horizon deploy is already W. So that leaves the cloudservices, cloudcontrol, cloudnet, and cloudvirt nodes to upgrade.
[] update IRC topic
[] downtime everything in icinga through 14:00CDT
aborrero@cumin1001:~ $ sudo cookbook sre.hosts.downtime -r "upgrading openstack" --min 120 lab*
aborrero@cumin1001:~ $ sudo cookbook sre.hosts.downtime -r "upgrading openstack" --min 120 cloud*
Start with cloudservices100[34].wikimedia.org (T304880).
[] downtime Horizon with https://gerrit.wikimedia.org/r/c/operations/puppet/+/682948
[] start an ssh session with a running VM so that you notice if/when the network goes down
[] disable puppet on all cloud* hosts
$ sudo cumin 'cloud*' "disable-puppet 'Upgrading to openstack Wallaby - T281275 - ${USER}'"
[] dump databases on cloudcontrol1005: nova_eqiad1, nova_api_eqiad1, nova_cell0_eqiad1, neutron, glance, keystone, cinder:
# mysqldump -u root nova_eqiad1 > /root/xenadbbackups/nova_eqiad1.sql
# mysqldump -u root nova_api_eqiad1 > /root/xenadbbackups/nova_api_eqiad1.sql
# mysqldump -u root nova_cell0_eqiad1 > /root/xenadbbackups/nova_cell0_eqiad1.sql
# mysqldump -u root neutron > /root/xenadbbackups/neutron.sql
# mysqldump -u root glance > /root/xenadbbackups/glance.sql
# mysqldump -u root placement > /root/xenadbbackups/placement.sql
# mysqldump -u root keystone > /root/xenadbbackups/keystone.sql
# mysqldump -u root trove_eqiad1 > /root/xenadbbackups/trove_eqiad1.sql
[] merge puppet patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/788359
Cloudcontrols:
All open database connections post-upgrade https://phabricator.wikimedia.org/P10999
Checking haproxy status echo "show stat" | socat /var/run/haproxy/haproxy.sock stdio | grep DOWN
cloudcontrol1005.wikimedia.org:
[] puppet agent --enable && puppet agent -tv
[] apt-get update
[] systemctl unmask keystone
[] DEBIAN_FRONTEND=noninteractive apt-get install glance python3-eventlet glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[x] DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] systemctl mask keystone
[] puppet agent -tv
[] nova-manage api_db sync
[] nova-manage db sync
[] placement-manage db sync
[] glance-manage db_sync
[] keystone-manage db_sync
[] cinder-manage db online_data_migrations
[] cinder-manage db sync
[] puppet agent -tv
[] nova-manage db online_data_migrations
[] systemctl list-units --failed (should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)
[] neutron-db-manage upgrade heads
cloudcontrol1006.wikimedia.org:
[] puppet agent --enable && puppet agent -tv
[] apt-get update
[] systemctl unmask keystone
[] DEBIAN_FRONTEND=noninteractive apt-get install glance python3-eventlet glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] systemctl mask keystone
[] puppet agent -tv
[] puppet agent -tv
[] systemctl list-units --failed
(should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)
cloudcontrol1007.wikimedia.org:
[] puppet agent --enable && puppet agent -tv
[] apt-get update
[] systemctl unmask keystone
[] DEBIAN_FRONTEND=noninteractive apt-get install glance python3-eventlet glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] systemctl mask keystone
[] puppet agent -tv
[] puppet agent -tv
[] systemctl list-units --failed
(should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)
cloudnets, wait for network outage window (one at a time please):
Begin with the standby node, as determined with:
$ neutron l3-agent-list-hosting-router cloudinstances2b-gw
Standby node (cloudnet1004.eqiad.wmnet):
[] puppet agent --enable && puppet agent -tv
[] apt-get update
[] DEBIAN_FRONTEND=noninteractive apt-get install -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" neutron-l3-agent python3-oslo.messaging python3-neutronclient python3-glanceclient
[] DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] puppet agent -tv
[] run `neutron-db-manage upgrade heads` **on cloudcontrol1005.wikimedia.org**
Active node (cloudnet1003.eqiad.wmnet):
[] puppet agent --enable && puppet agent -tv
[] apt-get update
[] DEBIAN_FRONTEND=noninteractive apt-get install -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" neutron-l3-agent python3-oslo.messaging python3-neutronclient python3-glanceclient
[] DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] puppet agent -tv
[] restore Horizon by reverting https://gerrit.wikimedia.org/r/c/operations/puppet/+/682948
**Break Time**
Cloudvirts (start with one test host first, cloudvirt1039:
[] puppet agent --enable && puppet agent -tv
[] apt-get update
[] DEBIAN_FRONTEND=noninteractive apt-get install -y python3-libvirt python3-eventlet python3-os-brick python3-os-vif nova-compute neutron-common nova-compute-kvm neutron-linuxbridge-agent python3-neutron python3-oslo.messaging python3-taskflow python3-tooz python3-keystoneauth1 python3-requests python3-urllib3 -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y --allow-downgrades -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
[] puppet agent -tv
[] service neutron-linuxbridge-agent restart
[] service libvirtd restart
[] service nova-compute restart
cloudbackup200[12].codfw.wmnet:
[] puppet agent --enable && puppet agent -tv
[] apt-get update
[] DEBIAN_FRONTEND=noninteractive apt upgrade cinder-backup
[] puppet agent -tv
[] (test from cloudcontrol1005.wikimedia.org) sudo wmcs-cinder-backup-manager
[] update IRC topic
[] enable puppet on all cloud* hosts
$ sudo cumin 'cloud*' "enable-puppet 'Upgrading to openstack Wallaby - T281275 - ${USER}'"
**Things to check**
[] Check 'openstack region list'. There should be exactly one region, eqiad1-r. If there is a second region named 'RegionOne' (this happened in codfw1dev), delete it; otherwise scripts that enumerate regions will be confused.
[] Clean up VMs in the admin-monitoring project that leaked during upgrade; delete them.
[] Create a new VM and confirm that DNS and ssh work properly
[] Logs will be extremely noisy about policy deprecations and value checks; this is expected because OpenStack is poised between two different policy systems; our existing policies are still (noisily) supported in U.