Page MenuHomePhabricator

ceph: Upgrade to latest Nautilus/Octopus to fix CVE-2021-20288
Closed, ResolvedPublic

Description

Write the description below

There was a security update for ceph released today to fix https://docs.ceph.com/en/latest/security/CVE-2021-20288/:

We have to upgrade to either of:

  • v14.2.20 (Nautilus)
  • v15.2.11 (Octopus)
  • v16.2.1 (Pacific)

Recommended upgrade process:

  • Users should upgrade to a patched version of Ceph at their earliest convenience.
  • Users should upgrade any unpatched clients at their earliest convenience. By default, these clients can be easily identified by checking the ceph health detail output for the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert.
  • If all clients cannot be upgraded immediately, the health alerts can be temporarily muted with:
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w  # 1 week
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w  # 1 week
  • After all clients have been updated and the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert is no longer present, the cluster should be set to prevent insecure global_id reclaim with:
ceph config set mon auth_allow_insecure_global_id_reclaim false

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-cloud) [2021-04-23T08:17:34Z] <dcaro> testing the upgrade_mons cookbook on codfw1 ceph cluster (T280641)

Change 681694 merged by jenkins-bot:

[operations/software/spicerack@master] icinga: use a bash command wrapper to allow sudo

https://gerrit.wikimedia.org/r/681694

Change 682098 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@master] upgrade-and-reboot: add possibility to use sudo

https://gerrit.wikimedia.org/r/682098

Mentioned in SAL (#wikimedia-cloud) [2021-04-23T09:17:44Z] <dcaro> testing the upgrade_osds cookbook on codfw1 ceph cluster (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-23T09:32:30Z] <dcaro> finished upgrade of ceph cluster on codfw1 using exclusively cookbooks (T280641)

Change 682106 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@master] wmcs.ceph: add cookbook to upgrade all osds

https://gerrit.wikimedia.org/r/682106

Change 681754 merged by jenkins-bot:

[operations/software/spicerack@master] icinga: use a sudo-friendly command to get command_file

https://gerrit.wikimedia.org/r/681754

Mentioned in SAL (#wikimedia-cloud) [2021-04-23T11:12:20Z] <dcaro> testing the drain_cloudvirt cookbook on codfw1 openstack cluster (T280641)

Change 682098 merged by jenkins-bot:

[operations/cookbooks@master] upgrade-and-reboot: add possibility to use sudo

https://gerrit.wikimedia.org/r/682098

Mentioned in SAL (#wikimedia-cloud) [2021-04-23T13:49:25Z] <dcaro> testing the drain_cloudvirt cookbook on codfw1 openstack cluster, draining cloudvirt2001 (T280641)

Change 682169 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@master] wmcs.openstack: add cloudvirt maintenance cookbooks

https://gerrit.wikimedia.org/r/682169

Mentioned in SAL (#wikimedia-cloud) [2021-04-26T09:45:27Z] <dcaro> draining cloudvirt2001-dev with the new cookbooks (T280641)

Change 682169 merged by jenkins-bot:

[operations/cookbooks@master] wmcs.openstack: add cloudvirt maintenance cookbooks

https://gerrit.wikimedia.org/r/682169

Mentioned in SAL (#wikimedia-cloud) [2021-04-27T13:00:13Z] <dcaro> codfw.openstack cloudvirt2001-dev back online, taking cloudvirt2002-dev out to upgrade ceph libraries (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-27T13:07:41Z] <dcaro> codfw.openstack cloudvirt2002-dev done, taking cloudvirt2003-dev out to upgrade ceph libraries (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-27T14:10:40Z] <dcaro> codfw.openstack upgraded ceph libraries to 15.2.11 (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T07:58:34Z] <dcaro> Upgrading ceph services on eqiad, starting with mons/managers (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T08:14:27Z] <dcaro> During the upgrade, ceph detected a clock skew on cloudcephmon1002, looking (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T08:15:30Z] <dcaro> During the upgrade, ceph detected a clock skew on cloudcephmon1002, it went away, I'm guessing systemd-timesyncd fixed it (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T08:18:07Z] <dcaro> During the upgrade, ceph detected a clock skew on cloudcephmon1002, cloudcephmon1001, they are back (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T08:18:24Z] <dcaro> All equiad ceph mons and mgrs upgraded (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T08:21:22Z] <dcaro> The clock skew seems intermittent, there's another task to follw it T275860 (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T08:58:25Z] <dcaro> During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T08:58:51Z] <dcaro> During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) all from osd.58 (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T08:59:23Z] <dcaro> During the upgrade, started getting warning 'slow osd heartbacks in the back', meaning that pings between osds are really slow (up to 190s) all from osd.58, currently on cloudcephosd1002 (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T09:03:52Z] <dcaro> Waiting for slow heartbeats from osd.58(cloudcephosd1002) to recover... (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T10:34:35Z] <dcaro> Slow/blocked opns from cloudcephmon03, "osd_failure(failed timeout osd.32..." (cloudcephosd1005), unset the cluster noout/norebalance and went away in a few secs, setting it again and continuing... (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T10:57:21Z] <dcaro> Got a PG getting stuck on 'remapping' after the OSD came up, had to unset the norebalance and then set it again to get it unstuck (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T11:06:01Z] <dcaro> All ceph server side upgraded to Octopus! \o/ (T280641)

Change 683370 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@master] wmcs: add cloudvirt drain cookbook

https://gerrit.wikimedia.org/r/683370

Mentioned in SAL (#wikimedia-cloud) [2021-04-30T09:47:59Z] <dcaro> draining coludvirt1013 for reboot (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-30T10:37:41Z] <dcaro> draining coludvirt1016 for reboot (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-04-30T11:16:54Z] <dcaro> draining and rebooting coludvirt1017, last one today (T280641)

Change 683857 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] wmcs.drain_hypervisor: skip all VMs in the canary project

https://gerrit.wikimedia.org/r/683857

Change 683371 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@master] wmcs.openstack: add live_upgrade cloudvirt cookbook

https://gerrit.wikimedia.org/r/683371

Change 683888 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@master] wmcs.openstack: add safe_reboot cloudvirt cookbook

https://gerrit.wikimedia.org/r/683888

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T08:26:11Z] <dcaro> draining and rebooting coludvirt1018 (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T09:12:39Z] <dcaro> draining and rebooting coludvirt1021 (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T10:31:29Z] <wm-bot> Safe rebooting 'cloudvirt1021.eqiad.wmnet'. (T280641 - cookbook ran by dcaro@vulcanus)

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T14:07:25Z] <dcaro> depooling tols-sgeexec-0908/7 to be able to restart the VMs as they got stuck during migration (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T15:13:12Z] <wm-bot> Safe rebooting 'cloudvirt1022.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T15:41:23Z] <wm-bot> Safe reboot of 'cloudvirt1022.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T15:41:50Z] <wm-bot> Safe rebooting 'cloudvirt1023.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T16:23:57Z] <dcaro> started tools-sgeexec-0907, was stuck on initramfs due to an unclean fs (/dev/vda3, root), ran fsck manually fixing all the errors and booted up correctly after (T280641)

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T16:29:08Z] <wm-bot> Safe rebooting 'cloudvirt1023.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-03T16:34:03Z] <wm-bot> Safe reboot of 'cloudvirt1023.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T08:03:25Z] <wm-bot> Safe rebooting 'cloudvirt1024.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T08:20:34Z] <wm-bot> Safe reboot of 'cloudvirt1024.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T08:34:31Z] <wm-bot> Safe rebooting 'cloudvirt1025.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T09:10:28Z] <wm-bot> Safe reboot of 'cloudvirt1025.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T09:10:31Z] <wm-bot> Safe rebooting 'cloudvirt1026.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T10:04:05Z] <wm-bot> Safe rebooting 'cloudvirt1026.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T15:15:35Z] <wm-bot> Safe rebooting 'cloudvirt1026.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T15:19:10Z] <wm-bot> Safe reboot of 'cloudvirt1026.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T15:22:50Z] <wm-bot> Safe rebooting 'cloudvirt1027.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T15:44:46Z] <wm-bot> Safe reboot of 'cloudvirt1027.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T15:45:57Z] <wm-bot> Safe rebooting 'cloudvirt1028.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-04T16:05:57Z] <wm-bot> Safe reboot of 'cloudvirt1028.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T08:02:07Z] <wm-bot> Safe rebooting 'cloudvirt1029.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T08:19:02Z] <wm-bot> Safe reboot of 'cloudvirt1029.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T08:19:06Z] <wm-bot> Safe rebooting 'cloudvirt1030.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T08:45:42Z] <wm-bot> Safe reboot of 'cloudvirt1030.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T08:45:45Z] <wm-bot> Safe rebooting 'cloudvirt1031.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T09:21:32Z] <wm-bot> Safe reboot of 'cloudvirt1031.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T09:21:35Z] <wm-bot> Safe rebooting 'cloudvirt1032.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T09:47:25Z] <wm-bot> Safe reboot of 'cloudvirt1032.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T09:47:28Z] <wm-bot> Safe rebooting 'cloudvirt1033.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T10:13:15Z] <wm-bot> Safe reboot of 'cloudvirt1033.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T10:13:19Z] <wm-bot> Safe rebooting 'cloudvirt1034.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Change 683857 merged by David Caro:

[operations/puppet@production] wmcs.drain_hypervisor: use canary project instead of VM name

https://gerrit.wikimedia.org/r/683857

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T10:39:42Z] <wm-bot> Safe reboot of 'cloudvirt1034.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T10:39:46Z] <wm-bot> Safe rebooting 'cloudvirt1035.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T11:08:08Z] <wm-bot> Safe reboot of 'cloudvirt1035.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T11:08:11Z] <wm-bot> Safe rebooting 'cloudvirt1036.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T11:31:17Z] <wm-bot> Safe reboot of 'cloudvirt1036.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T11:31:21Z] <wm-bot> Safe rebooting 'cloudvirt1037.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T11:56:23Z] <wm-bot> Safe reboot of 'cloudvirt1037.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T11:56:26Z] <wm-bot> Safe rebooting 'cloudvirt1038.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T12:35:14Z] <wm-bot> Safe rebooting 'cloudvirt1039.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T14:10:07Z] <wm-bot> Safe rebooting 'cloudvirt1039.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T14:14:42Z] <wm-bot> Safe reboot of 'cloudvirt1039.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T14:14:46Z] <wm-bot> Safe rebooting 'cloudvirt1041.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T14:39:57Z] <wm-bot> Safe reboot of 'cloudvirt1041.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T14:40:00Z] <wm-bot> Safe rebooting 'cloudvirt1042.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T14:59:44Z] <wm-bot> Safe reboot of 'cloudvirt1042.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T14:59:48Z] <wm-bot> Safe rebooting 'cloudvirt1043.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T15:03:39Z] <wm-bot> Safe reboot of 'cloudvirt1043.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T15:03:41Z] <wm-bot> Safe rebooting 'cloudvirt1044.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T15:07:34Z] <wm-bot> Safe reboot of 'cloudvirt1044.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T15:07:37Z] <wm-bot> Safe rebooting 'cloudvirt1045.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T15:11:27Z] <wm-bot> Safe reboot of 'cloudvirt1045.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T15:11:29Z] <wm-bot> Safe rebooting 'cloudvirt1046.eqiad.wmnet'. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T15:15:22Z] <wm-bot> Safe reboot of 'cloudvirt1046.eqiad.wmnet' finished successfully. (T280641) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-05-05T16:07:25Z] <dcaro> disallowing insecure global ids on the eqiad ceph cluster (T280641)