Page MenuHomePhabricator

[ceph] Upgrade to 14.2.16 from 14.2.5
Closed, ResolvedPublic

Description

We have a new version available, that fixes lots of bugs and CVEs.
This task is to upgrade all our clusters to this new version.

The release notes:
https://docs.ceph.com/en/latest/releases/nautilus/#v14-2-6-nautilus

Process

  • Add the new packages to the repo (they will not be upgraded automatically): T272296
  • Upgrade and test the upgrade on codfw1 cluster (official docs: https://docs.ceph.com/en/nautilus/install/upgrading-ceph/)
    • Upgrade mons (one by one, upgrade packages, then restart daemons, then wait for rejoin)
    • Upgrade osds (one by one, upgrade packages, then restart daemons, then wait for rejoin)
  • Setup a time and upgrade equiad1 (same procedure)

Event Timeline

dcaro triaged this task as High priority.Jan 18 2021, 2:26 PM
dcaro created this task.

Mentioned in SAL (#wikimedia-cloud) [2021-01-18T15:21:42Z] <dcaro> Starting upgrade of ceph mon nodes on codfw (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-18T15:35:00Z] <dcaro> Upgraded mon sevices on codfw ceph cluster, starting with mgr ones (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-18T15:38:04Z] <dcaro> Upgraded mgr sevices on codfw ceph cluster, starting with osd ones (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-18T16:00:34Z] <dcaro> Codfw1 ceph cluster uprgaded, will wait until tomorrow to see if there's any instability, but everything looks fine (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-20T09:01:13Z] <dcaro> Will start the ceph upgrade in 15 min, no downtime nor performance impact is expected (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-20T09:16:02Z] <dcaro> Starting eqiad ceph upgrade, upgrading the mon servers cloudcephmon1* (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-20T09:22:13Z] <dcaro> Mon daemons upgraded and running, upgrading mgr daemons on servers cloudcephmon1* (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-20T09:24:39Z] <dcaro> Mgr daemons upgraded and running, upgrading osd daemons on servers cloudcephosd1*, this make take a bit longer (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-20T09:37:55Z] <dcaro> 25% of the eqiad cluster upgraded... continuing (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-20T09:46:30Z] <dcaro> 75% of the eqiad cluster upgraded... continuing (T272303)

Mentioned in SAL (#wikimedia-cloud) [2021-01-20T09:55:08Z] <dcaro> Eqiad ceph cluster uprgaded, doing sanity checks (T272303)

Will also do an all-round reboot, tracked on T272458

Mentioned in SAL (#wikimedia-cloud) [2021-01-20T10:05:10Z] <dcaro> Everything looks ok, created a new vm with a volume in ceph without issues, and on warnings/errors on ceph status, closing (T272303)