Page MenuHomePhabricator

cloudvps: neutron: upgrade jessie -> stretch
Closed, ResolvedPublic

Description

Tracking task to upgrade cloudnet servers from jessie -> stretch (mitaka).

Steps are:

  • reimage standby server as stretch
  • check new interface names and update hiera accordingly
  • setup neutron l3 agent
  • failover standby (stretch as active)
  • repeat with the remaining jessie box

Event Timeline

aborrero triaged this task as Medium priority.Jan 21 2019, 1:06 PM
aborrero created this task.

I'm conducting the steps in labtestn first for the sake of testing.

For the record, I needed these neutron commands when enrolling the new l3 agent in labtestn:

root@labtestcontrol2003:~# neutron l3-agent-list-hosting-router cloudinstances2b-gw
+--------------------------------------+--------------------+----------------+-------+----------+
| id                                   | host               | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------+----------------+-------+----------+
| 569a3a0b-824f-4189-b6c7-b067d14a0fa3 | labtestneutron2001 | True           | xxx   | standby  |
| ac4771f4-bf50-4607-9b2b-45a3fb0d574b | labtestneutron2002 | True           | :-)   | active   |
+--------------------------------------+--------------------+----------------+-------+----------+

root@labtestcontrol2003:~# neutron agent-delete 569a3a0b-824f-4189-b6c7-b067d14a0fa3
Deleted agent: 569a3a0b-824f-4189-b6c7-b067d14a0fa3

root@labtestcontrol2003:~# neutron l3-agent-list-hosting-router cloudinstances2b-gw
+--------------------------------------+--------------------+----------------+-------+----------+
| id                                   | host               | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------+----------------+-------+----------+
| ac4771f4-bf50-4607-9b2b-45a3fb0d574b | labtestneutron2002 | True           | :-)   | active   |
+--------------------------------------+--------------------+----------------+-------+----------+

root@labtestcontrol2003:~# neutron agent-list | grep cloudnet
| 312ede1a-e0ea-4556-88f9-3a7ce45a02f8 | Metadata agent     | cloudnet2001-dev   |                   | :-)   | True           | neutron-metadata-agent    |
| 498c6ec0-1e65-4514-9633-be34e5f3a486 | Linux bridge agent | cloudnet2001-dev   |                   | :-)   | True           | neutron-linuxbridge-agent |
| 61d2165e-0269-4964-96df-24bc61b2569d | L3 agent           | cloudnet2001-dev   | nova              | :-)   | True           | neutron-l3-agent          |
| 8bfffc22-509d-4ca0-930f-17e3f78da1d9 | DHCP agent         | cloudnet2001-dev   | nova              | :-)   | True           | neutron-dhcp-agent        |

root@labtestcontrol2003:~# neutron l3-agent-router-add 61d2165e-0269-4964-96df-24bc61b2569d cloudinstances2b-gw
Added router cloudinstances2b-gw to L3 agent

root@labtestcontrol2003:~# neutron l3-agent-list-hosting-router cloudinstances2b-gw
+--------------------------------------+--------------------+----------------+-------+----------+
| id                                   | host               | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------+----------------+-------+----------+
| 61d2165e-0269-4964-96df-24bc61b2569d | cloudnet2001-dev   | True           | :-)   | standby  |
| ac4771f4-bf50-4607-9b2b-45a3fb0d574b | labtestneutron2002 | True           | :-)   | active   |
+--------------------------------------+--------------------+----------------+-------+----------+

I was unable to get the new cloudnet2001-dev working with neutron. I just saw that interface names are different and we need some hiera overrides.

Change 485657 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudnet2001-dev: hiera cleanup for stretch/mitaka

https://gerrit.wikimedia.org/r/485657

Change 485657 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudnet2001-dev: hiera cleanup for stretch/mitaka

https://gerrit.wikimedia.org/r/485657

Change 485800 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] labtestn: neutron: refresh hiera settings for interface names

https://gerrit.wikimedia.org/r/485800

Change 485800 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] labtestn: neutron: refresh hiera settings for interface names

https://gerrit.wikimedia.org/r/485800

Change 485869 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudnet1003: reimage to stretch

https://gerrit.wikimedia.org/r/485869

Change 485869 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudnet1003: reimage to stretch

https://gerrit.wikimedia.org/r/485869

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

cloudnet1003.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201901221836_aborrero_53986_cloudnet1003_eqiad_wmnet.log.

Mentioned in SAL (#wikimedia-operations) [2019-01-22T18:36:56Z] <arturo> T214299 reimaging cloudnet1003 as debian stretch

Mentioned in SAL (#wikimedia-cloud) [2019-01-22T18:37:08Z] <arturo> T214299 reimaging cloudnet1003 as debian stretch

Mentioned in SAL (#wikimedia-cloud) [2019-01-22T18:40:31Z] <arturo> T214299 manually delete from neutron agents from cloudnet1003 (must be added again after reimage, with new uuids)

Completed auto-reimage of hosts:

['cloudnet1003.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-cloud) [2019-01-22T19:19:47Z] <arturo> T214299 stretch cloudnet1003 is apparently all set

Change 485878 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudnet1003: hiera: refresh interface names

https://gerrit.wikimedia.org/r/485878

Change 485878 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudnet1003: hiera: refresh interface names

https://gerrit.wikimedia.org/r/485878

Mentioned in SAL (#wikimedia-operations) [2019-01-22T19:30:41Z] <arturo> T214299 additional reboot for cloudnet1003

Change 486047 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: neutron: introduce base interface hiera key

https://gerrit.wikimedia.org/r/486047

Change 486047 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: neutron: introduce base interface hiera key

https://gerrit.wikimedia.org/r/486047

Mentioned in SAL (#wikimedia-operations) [2019-01-23T11:04:18Z] <arturo> T214299 reboot cloudnet2001-dev, cloudnet2002-dev and cloudnet1003 for new interface names

cloudnet1003 - CRITICAL - degraded: The system is operational but one or more units failed.

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudnet1003&service=Check+systemd+state

Thanks @Dzahn for the heads-up :-) really appreciated (I didn't noticed myself)

Change 486060 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: neutron: l3_agent: merge similar code for eqiad1/labtestn

https://gerrit.wikimedia.org/r/486060

Change 486060 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: neutron: l3_agent: merge similar code for eqiad1/labtestn

https://gerrit.wikimedia.org/r/486060

Mentioned in SAL (#wikimedia-cloud) [2019-01-24T09:51:48Z] <arturo> T214299 failover cloudnet1004 to cloudnet1003

Change 486236 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudnet1004: reimage to Debian Stretch

https://gerrit.wikimedia.org/r/486236

Change 486236 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudnet1004: reimage to Debian Stretch

https://gerrit.wikimedia.org/r/486236

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

cloudnet1004.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201901241003_aborrero_47033_cloudnet1004_eqiad_wmnet.log.

Mentioned in SAL (#wikimedia-operations) [2019-01-24T10:03:46Z] <arturo> T214299 reimage cloudnet1004 to debian stretch

Mentioned in SAL (#wikimedia-cloud) [2019-01-24T10:03:55Z] <arturo> T214299 reimage cloudnet1004 to debian stretch

Completed auto-reimage of hosts:

['cloudnet1004.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-01-24T10:59:07Z] <arturo> T214299 additional reboot for cloudnet1004

Mentioned in SAL (#wikimedia-cloud) [2019-01-24T11:07:09Z] <arturo> T214299 failover cloudnet1003 to cloudnet1004

All seems fine. All cloudnet servers are running mitaka/stretch.