Page MenuHomePhabricator

upgrade cloud-vps openstack to Openstack version 'Wallaby'
Closed, ResolvedPublic

Description

The current Horizon deploy is already W. So that leaves the cloudservices, cloudcontrol, cloudnet, and cloudvirt nodes to upgrade.

  • update IRC topic
  • downtime everything in icinga through 14:00CDT

    aborrero@cumin1001:~ $ sudo cookbook sre.hosts.downtime -r "upgrading openstack" --min 120 lab*

    aborrero@cumin1001:~ $ sudo cookbook sre.hosts.downtime -r "upgrading openstack" --min 120 cloud*

Start with cloudservices100[34].wikimedia.org (T304880).

  • dump databases on cloudcontrol1003: nova_eqiad1, nova_api_eqiad1, nova_cell0_eqiad1, neutron, glance, keystone, cinder:
    1. mysqldump -u root nova_eqiad1 > /root/wallabydbbackups/nova_eqiad1.sql
    2. mysqldump -u root nova_api_eqiad1 > /root/wallabydbbackups/nova_api_eqiad1.sql
    3. mysqldump -u root nova_cell0_eqiad1 > /root/wallabydbbackups/nova_cell0_eqiad1.sql
    4. mysqldump -u root neutron > /root/wallabydbbackups/neutron.sql
    5. mysqldump -u root glance > /root/wallabydbbackups/glance.sql
    6. mysqldump -u root placement > /root/wallabydbbackups/placement.sql
    7. mysqldump -u root keystone > /root/wallabydbbackups/keystone.sql

Cloudcontrols:

All open database connections post-upgrade https://phabricator.wikimedia.org/P10999
Checking haproxy status echo "show stat" | socat /var/run/haproxy/haproxy.sock stdio | grep DOWN

cloudcontrol1003.wikimedia.org:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • systemctl unmask keystone
  • DEBIAN_FRONTEND=noninteractive apt-get install glance python3-eventlet glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • systemctl mask keystone
  • puppet agent -tv
  • nova-manage api_db sync
  • nova-manage db sync
  • placement-manage db sync
  • glance-manage db_sync
  • keystone-manage db_sync
  • cinder-manage db online_data_migrations
  • cinder-manage db sync
  • puppet agent -tv
  • nova-manage db online_data_migrations
  • systemctl list-units --failed (should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)
  • neutron-db-manage upgrade heads

cloudcontrol1004.wikimedia.org:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • systemctl unmask keystone
  • DEBIAN_FRONTEND=noninteractive apt-get install glance python3-eventlet glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • systemctl mask keystone
  • puppet agent -tv
  • puppet agent -tv
  • systemctl list-units --failed (should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)

cloudcontrol1005.wikimedia.org:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • systemctl unmask keystone
  • DEBIAN_FRONTEND=noninteractive apt-get install glance python3-eventlet glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • systemctl mask keystone
  • puppet agent -tv
  • puppet agent -tv
  • systemctl list-units --failed (should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)

cloudnets, wait for network outage window (one at a time please):

Begin with the standby node, as determined with:

$ neutron l3-agent-list-hosting-router cloudinstances2b-gw

Standby node (cloudnet1004.eqiad.wmnet):

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • DEBIAN_FRONTEND=noninteractive apt-get install -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" neutron-l3-agent python3-oslo.messaging python3-neutronclient python3-glanceclient
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • puppet agent -tv
  • run neutron-db-manage upgrade heads on cloudcontrol1003.wikimedia.org

Active node (cloudnet1003.eqiad.wmnet):

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • DEBIAN_FRONTEND=noninteractive apt-get install -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" neutron-l3-agent python3-oslo.messaging python3-neutronclient python3-glanceclient
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • puppet agent -tv

Break Time

Cloudvirts (start with one test host first, cloudvirt1039:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • DEBIAN_FRONTEND=noninteractive apt-get install -y python3-libvirt python3-eventlet python3-os-brick python3-os-vif nova-compute neutron-common nova-compute-kvm neutron-linuxbridge-agent python3-neutron python3-oslo.messaging python3-taskflow python3-tooz python3-keystoneauth1 python3-requests python3-urllib3 -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y --allow-downgrades -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • puppet agent -tv
  • service neutron-linuxbridge-agent restart
  • service libvirtd restart
  • service nova-compute restart
  • cloudvirt1039.eqiad.wmnet
  • cloudvirt1016.eqiad.wmnet
  • cloudvirt1017.eqiad.wmnet
  • cloudvirt1018.eqiad.wmnet
  • cloudvirt1019.eqiad.wmnet
  • cloudvirt1020.eqiad.wmnet
  • cloudvirt1021.eqiad.wmnet
  • cloudvirt1022.eqiad.wmnet
  • cloudvirt1023.eqiad.wmnet
  • cloudvirt1024.eqiad.wmnet
  • cloudvirt1025.eqiad.wmnet
  • cloudvirt1026.eqiad.wmnet
  • cloudvirt1027.eqiad.wmnet
  • cloudvirt1028.eqiad.wmnet
  • cloudvirt1029.eqiad.wmnet
  • cloudvirt1030.eqiad.wmnet
  • cloudvirt1031.eqiad.wmnet
  • cloudvirt1032.eqiad.wmnet
  • cloudvirt1033.eqiad.wmnet
  • cloudvirt1034.eqiad.wmnet
  • cloudvirt1035.eqiad.wmnet
  • cloudvirt1036.eqiad.wmnet
  • cloudvirt1037.eqiad.wmnet
  • cloudvirt1038.eqiad.wmnet
  • cloudvirt1040.eqiad.wmnet
  • cloudvirt1041.eqiad.wmnet
  • cloudvirt1042.eqiad.wmnet
  • cloudvirt1043.eqiad.wmnet
  • cloudvirt1044.eqiad.wmnet
  • cloudvirt1045.eqiad.wmnet
  • cloudvirt1046.eqiad.wmnet
  • cloudvirt1047.eqiad.wmnet
  • cloudvirt-wdqs1001.eqiad.wmnet
  • cloudvirt-wdqs1002.eqiad.wmnet
  • cloudvirt-wdqs1003.eqiad.wmnet

cloudbackup200[12].codfw.wmnet:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • DEBIAN_FRONTEND=noninteractive apt upgrade cinder-backup
  • puppet agent -tv
  • (test from cloudcontrol1005.wikimedia.org) sudo wmcs-cinder-backup-manager
  • update IRC topic
  • enable puppet on all cloud* hosts

    $ sudo cumin 'cloud*' "enable-puppet 'Upgrading to openstack Wallaby - T281275 - ${USER}'"

Things to check

  • Check 'openstack region list'. There should be exactly one region, eqiad1-r. If there is a second region named 'RegionOne' (this happened in codfw1dev), delete it; otherwise scripts that enumerate regions will be confused.
  • Clean up VMs in the admin-monitoring project that leaked during upgrade; delete them.
  • Create a new VM and confirm that DNS and ssh work properly
  • Logs will be extremely noisy about policy deprecations and value checks; this is expected because OpenStack is poised between two different policy systems; our existing policies are still (noisily) supported in U.

Related Objects

StatusSubtypeAssignedTask
ResolvedAndrew
ResolvedAndrew
Resolvedrook
ResolvedAndrew
ResolvedAndrew
Resolvedtaavi
Resolvedaborrero
Resolvedaborrero
Resolveddcaro
Resolveddcaro
Resolveddcaro
Resolveddcaro
Duplicatedcaro
Resolveddcaro
Resolvedtaavi
Resolveddcaro
Resolvedaborrero
Resolveddcaro
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
Resolvedayounsi
Resolvedrook
Resolvedaborrero
Resolvedrook
Resolvedrook

Event Timeline

aborrero triaged this task as Medium priority.May 11 2021, 4:17 PM

Change 768829 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add files and templates for OpenStack Wallaby

https://gerrit.wikimedia.org/r/768829

Change 768830 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] OpenStack: add manifests for openstack wallaby

https://gerrit.wikimedia.org/r/768830

Change 768852 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Update hacked nova/api/openstack/compute/servers.py for Wallaby

https://gerrit.wikimedia.org/r/768852

Change 768853 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Update trove/instance/models.py for wallaby

https://gerrit.wikimedia.org/r/768853

Change 768854 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Update trove/instance/models.py for wallaby

https://gerrit.wikimedia.org/r/768854

Change 769051 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add files for OpenStack Wallaby

https://gerrit.wikimedia.org/r/769051

Change 768829 merged by Andrew Bogott:

[operations/puppet@production] Add templates for OpenStack Wallaby

https://gerrit.wikimedia.org/r/768829

Change 769051 merged by Andrew Bogott:

[operations/puppet@production] Add some files for OpenStack Wallaby

https://gerrit.wikimedia.org/r/769051

Change 769054 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add more files for OpenStack Wallaby

https://gerrit.wikimedia.org/r/769054

Change 769054 merged by Andrew Bogott:

[operations/puppet@production] Add more files for OpenStack Wallaby

https://gerrit.wikimedia.org/r/769054

Change 768830 merged by Andrew Bogott:

[operations/puppet@production] OpenStack: add manifests for openstack wallaby

https://gerrit.wikimedia.org/r/768830

Change 768852 merged by Andrew Bogott:

[operations/puppet@production] Update hacked nova/api/openstack/compute/servers.py for Wallaby

https://gerrit.wikimedia.org/r/768852

Change 768853 merged by Andrew Bogott:

[operations/puppet@production] Update trove/instance/models.py for wallaby

https://gerrit.wikimedia.org/r/768853

Change 768854 merged by Andrew Bogott:

[operations/puppet@production] Update trove/instance/models.py for wallaby

https://gerrit.wikimedia.org/r/768854

rook updated the task description. (Show Details)

Change 788359 had a related patch set uploaded (by Vivian Rook; author: Vivian Rook):

[operations/puppet@production] upgrade openstack to wallaby

https://gerrit.wikimedia.org/r/788359

Change 788359 merged by Vivian Rook:

[operations/puppet@production] upgrade openstack to wallaby

https://gerrit.wikimedia.org/r/788359

rook updated the task description. (Show Details)