release notes:
https://releases.openstack.org/antelope/
generic upgrade how-to:
https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Openstack_upgrade
release notes:
https://releases.openstack.org/antelope/
generic upgrade how-to:
https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Openstack_upgrade
@dr0ptp4kt I was told by @Andrew that you were interested in following this upgrade procedure. This is the main tracking task, feel free to add questions here or ping me in IRC. It's the first time I try to follow this procedure so I'm likely to encounter a few roadblocks, I'll do my best to document them here in Phabricator.
Change 954056 merged by FNegri:
[operations/puppet@production] [openstack] upgrade codfw1dev to Antelope (2023.1)
Running the cookbook upgrade_openstack_node on the first cloudcontrol node failed with:
fnegri@cloudcumin1001:~$ sudo cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node --fqdn-to-upgrade cloudcontrol2001-dev.codfw.wmnet [...] E: The repository 'http://mirrors.wikimedia.org/osbpo bullseye-antelope-backports-nochange Release' does not have a Release file. E: The repository 'http://mirrors.wikimedia.org/osbpo bullseye-antelope-backports Release' does not have a Release file.
It looks like Antelope is not packaged for Bullseye but only for Bookworm: https://mirrors.wikimedia.org/osbpo/pool/
We should probably upgrade all the servers to Bookworm first (keeping OpenStack on version Zed), and then upgrade OpenStack to Antelope.
Change 963029 had a related patch set uploaded (by FNegri; author: FNegri):
[operations/puppet@production] Revert "Revert "[openstack] upgrade codfw1dev to Antelope (2023.1)""
Change 963029 merged by FNegri:
[operations/puppet@production] Revert "Revert "[openstack] upgrade codfw1dev to Antelope (2023.1)""
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-04T13:49:58Z] <wm-bot2> fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) (T341285)
The cookbook failed with the following error
Database expansion failed. Database expansion should have brought the database version up to "2023_1_expand01" revision. But, current revisions are: ('wallaby_contract01',)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-04T14:40:50Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-04T14:41:11Z] <wm-bot2> fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) (T341285)
Running the failing command manually worked just fine:
root@cloudcontrol2001-dev:~# glance-manage db sync 2023-10-04 14:26:22.932 3125648 INFO alembic.runtime.migration [-] Context impl MySQLImpl. 2023-10-04 14:26:22.933 3125648 INFO alembic.runtime.migration [-] Will assume non-transactional DDL. 2023-10-04 14:26:22.960 3125648 INFO alembic.runtime.migration [-] Context impl MySQLImpl. 2023-10-04 14:26:22.960 3125648 INFO alembic.runtime.migration [-] Will assume non-transactional DDL. Database expansion is up to date. No expansion needed. 2023-10-04 14:26:22.982 3125648 INFO alembic.runtime.migration [-] Context impl MySQLImpl. 2023-10-04 14:26:22.983 3125648 INFO alembic.runtime.migration [-] Will assume non-transactional DDL. Database migration is up to date. No migration needed. 2023-10-04 14:26:23.003 3125648 INFO alembic.runtime.migration [-] Context impl MySQLImpl. 2023-10-04 14:26:23.004 3125648 INFO alembic.runtime.migration [-] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade wallaby_contract01 -> xena_contract01 INFO [alembic.runtime.migration] Running upgrade xena_contract01 -> yoga_contract01 INFO [alembic.runtime.migration] Running upgrade yoga_contract01 -> zed_contract01 INFO [alembic.runtime.migration] Running upgrade zed_contract01 -> 2023_1_contract01 INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. Upgraded database to: 2023_1_contract01, current revision(s): 2023_1_contract01 INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. Database is synced successfully.
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-04T14:44:41Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-04T14:54:53Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-05T09:32:59Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-05T09:40:01Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-05T09:47:28Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-05T09:54:42Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-05T16:11:32Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-05T16:24:09Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-05T16:29:31Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-05T16:41:36Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
The cookbook has been run successfully on the following nodes:
There are some puppet errors in cloudcontrols. Once those are fixed, the cookbook can be run on the remaining nodes:
So far all the cloudcontrol issues have been related to obsolete init scripts. I updated many (all?) of our init scripts in puppet to match the packaged ones, and now cloudcontrols seem to be working.
https://gerrit.wikimedia.org/r/c/operations/puppet/+/964041
https://gerrit.wikimedia.org/r/c/operations/puppet/+/964045
etc.
Similar changes may be needed for designate.
Change 964164 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Update cinder-api init.d file to match upstream packaged version
Change 964165 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] heat-api: update init file to match upstream packaged version
Change 964166 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] magnum-api: update init file to match upstream package
Change 964164 merged by Andrew Bogott:
[operations/puppet@production] Update cinder-api init.d file to match upstream packaged version
Change 964165 merged by Andrew Bogott:
[operations/puppet@production] heat-api: update init file to match upstream packaged version
Change 964166 merged by Andrew Bogott:
[operations/puppet@production] magnum-api: update init file to match upstream package
Change 964169 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] heat-api-cfn: update init file to match upstream packaged version
Change 964169 merged by Andrew Bogott:
[operations/puppet@production] heat-api-cfn: update init file to match upstream packaged version
Something is still broken in cloudcontrol2001-dev, the service cinder-scheduler is failing with Unable to connect to AMQP server on rabbitmq03.codfw1dev.wikimediacloud.org:5671 after inf tries.
The error above was fixed by restarting rabbitmq-server in cloudcontrol2005-dev (which is the host corresponding to rabbitmq03).
I am now proceeding with running the upgrade_openstack_node cookbook on cloudservices200[45] hosts.
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-09T13:45:16Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-09T13:55:03Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-09T15:26:12Z] <wm-bot2> fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-09T15:41:56Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-09T15:49:56Z] <wm-bot2> fran@wmf3169 END (FAIL) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=99) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-09T16:18:10Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-09T16:26:19Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
cloudservices200[45]-dev have been upgraded. Puppet is not showing errors, but in both hosts it's showing a corrective action on each run:
2023-10-09T16:07:55.754305+00:00 cloudservices2004-dev puppet-agent[74462]: (/Stage[main]/Pdns_server::Db_backups/Dbutils::Statement[pdns_server_db_backups_stmt_1]/Exec[db-statement-pdns_server_db_backups_stmt_1]/returns) executed successfully (corrective)
Change 964858 had a related patch set uploaded (by FNegri; author: FNegri):
[operations/puppet@production] pdns_server: rename privilege for bookworm
Change 964858 merged by FNegri:
[operations/puppet@production] pdns_server: rename privilege for bookworm
https://gerrit.wikimedia.org/r/964858 fixed the Puppet constant change in cloudservices200[4-5]-dev. I'm proceeding with upgrading the cloudvirt*-dev nodes using the cookbook live_upgrade_openstack.
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T09:43:43Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T09:50:22Z] <fnegri@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T09:56:36Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T10:03:20Z] <fnegri@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T10:52:45Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T10:59:43Z] <fnegri@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T11:00:35Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T11:00:42Z] <fnegri@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=99) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T11:33:08Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T11:38:38Z] <fnegri@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T12:40:57Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-10T12:46:32Z] <fnegri@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-11T10:36:33Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-11T10:42:37Z] <fnegri@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.live_upgrade_openstack (exit_code=0) (T341285)
Change 965546 had a related patch set uploaded (by FNegri; author: FNegri):
[operations/puppet@production] [openstack] remove hiera override for 2 hosts
Change 965546 merged by FNegri:
[operations/puppet@production] [openstack] remove hiera override for 2 hosts
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-12T17:16:41Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-12T17:16:46Z] <wm-bot2> fran@wmf3169 END (ERROR) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=97) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-13T08:20:55Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-13T08:30:51Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-13T08:31:18Z] <wm-bot2> fran@wmf3169 START - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (T341285)
Mentioned in SAL (#wikimedia-cloud-feed) [2023-10-13T08:41:22Z] <wm-bot2> fran@wmf3169 END (PASS) - Cookbook wmcs.openstack.cloudcontrol.upgrade_openstack_node (exit_code=0) (T341285)
OpenStack .deb packages have now been upgraded to Antelope (using the cookbooks upgrade_openstack_node and live_upgrade_openstack) on all codfw nodes:
These other cloud* nodes did not need an upgrade as they don't include any openstack packages (/etc/apt/sources.list.d/openstack*):
We now want to test that everything works fine in codfw, before proceeding with upgrading eqiad.
I created two sub-tasks for the eqiad work:
Change 965779 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):
[operations/puppet@production] Neutron/antelope: add policy rule for create_port:device_id
Change 965779 merged by Andrew Bogott:
[operations/puppet@production] Neutron/antelope: add policy rule for create_port:device_id