Page MenuHomePhabricator

upgrade cloud-vps openstack to Openstack version 'Wallaby'
Closed, DuplicatePublic

Description

Start with cloudservices1003/1004. The current Horizon deploy is backwards-compatible with W (How is this learned?). So that leaves the cloudcontrol, cloudnet, and cloudvirt nodes to upgrade.

  • update IRC topic
  • downtime everything in icinga through 14:00CDT

    aborrero@cumin1001:~ $ sudo cookbook sre.hosts.downtime -r "upgrading openstack" --min 120 lab*

    aborrero@cumin1001:~ $ sudo cookbook sre.hosts.downtime -r "upgrading openstack" --min 120 cloud*
  • dump databases on cloudcontrol1003: nova_eqiad1, nova_api_eqiad1, nova_cell0_eqiad1, neutron, glance, keystone, cinder:
    1. mysqldump -u root nova_eqiad1 > /root/wallabydbbackups/nova_eqiad1.sql
    2. mysqldump -u root nova_api_eqiad1 > /root/wallabydbbackups/nova_api_eqiad1.sql
    3. mysqldump -u root nova_cell0_eqiad1 > /root/wallabydbbackups/nova_cell0_eqiad1.sql
    4. mysqldump -u root neutron > /root/wallabydbbackups/neutron.sql
    5. mysqldump -u root glance > /root/wallabydbbackups/glance.sql
    6. mysqldump -u root placement > /root/wallabydbbackups/placement.sql
    7. mysqldump -u root keystone > /root/wallabydbbackups/keystone.sql

Cloudcontrols:

All open database connections post-upgrade https://phabricator.wikimedia.org/P10999
Checking haproxy status echo "show stat" | socat /var/run/haproxy/haproxy.sock stdio | grep DOWN

Cloudcontrol1003:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • systemctl unmask keystone
  • DEBIAN_FRONTEND=noninteractive apt-get install glance glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • systemctl mask keystone
  • puppet agent -tv
  • nova-manage api_db sync
  • nova-manage db sync
  • placement-manage db sync
  • glance-manage db_sync
  • keystone-manage db_sync
  • cinder-manage db online_data_migrations
  • cinder-manage db sync
  • puppet agent -tv
  • nova-manage db online_data_migrations
  • systemctl list-units --failed (should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)

Cloudcontrol1004:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • systemctl unmask keystone
  • DEBIAN_FRONTEND=noninteractive apt-get install glance glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api placement-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • systemctl mask keystone
  • puppet agent -tv
  • puppet agent -tv
  • systemctl list-units --failed (should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)

Cloudcontrol1005:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • systemctl unmask keystone
  • DEBIAN_FRONTEND=noninteractive apt-get install glance glance-api glance-common keystone nova-api nova-conductor nova-scheduler nova-common glance neutron-server python3-requests python3-urllib3 placement-api cinder-volume cinder-scheduler cinder-api placement-api python3-oslo.messaging python3-tooz -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • systemctl mask keystone
  • puppet agent -tv
  • puppet agent -tv
  • systemctl list-units --failed (should show nothing failed, or just keystone. If keystone is failed just reset with systemctl reset-failed)

cloudnets (one at a time please):

Begin with the standby node, as determined with:

$ neutron l3-agent-list-hosting-router cloudinstances2b-gw

Standby node:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • DEBIAN_FRONTEND=noninteractive apt-get install -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" neutron-l3-agent python3-oslo.messaging python3-neutronclient python3-glanceclient
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • puppet agent -tv
  • neutron-db-manage upgrade heads on cloudcontrol1003

Active node:

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • DEBIAN_FRONTEND=noninteractive apt-get install -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" neutron-l3-agent python3-oslo.messaging python3-neutronclient python3-glanceclient
  • DEBIAN_FRONTEND=noninteractive apt-get upgrade -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • puppet agent -tv

Break Time

Cloudvirts (start with one test host first, cloudvirt1039. Don't forget about cloudvirtwdqs ):

  • puppet agent --enable && puppet agent -tv
  • apt-get update
  • DEBIAN_FRONTEND=noninteractive apt-get install -y python3-libvirt python3-os-vif nova-compute neutron-common nova-compute-kvm neutron-linuxbridge-agent python3-neutron python3-eventlet python3-oslo.messaging python3-taskflow python3-tooz python3-keystoneauth1 python3-positional python3-requests python3-urllib3 -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y --allow-downgrades -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold"
  • puppet agent -tv
  • service neutron-linuxbridge-agent restart
  • service libvirtd restart
  • service nova-compute restart
  • update IRC topic
  • enable puppet on all cloud* hosts

    $ sudo cumin 'cloud*' "enable-puppet 'Upgrading to openstack Train - T261135 - ${USER}'"

Things to check

  • Check 'openstack region list'. There should be exactly one region, eqiad1-r. If there is a second region named 'RegionOne' (this happened in codfw1dev), delete it; otherwise scripts that enumerate regions will be confused.
  • Clean up VMs in the admin-monitoring project that leaked during upgrade; delete them.
  • Create a new VM and confirm that DNS and ssh work properly
  • Logs will be extremely noisy about policy deprecations and value checks; this is expected because OpenStack is poised between two different policy systems; our existing policies are still (noisily) supported in U.