Page MenuHomePhabricator

cloudcephosd1021 is using an old ceph version because its running debian bullseye instead of buster
Open, MediumPublic

Description

From ceph-nautilus (v14) instead of ceph-octopus (v15).

aborrero@cumin1001:~ $ sudo cumin A:cloudceph-eqiad1 "dpkg -l ceph-common"
27 hosts will be targeted:
cloudcephmon[1001-1003].eqiad.wmnet,cloudcephosd[1001-1024].eqiad.wmnet
Ok to proceed on 27 hosts? Enter the number of affected hosts to confirm or "q" to quit 27
===== NODE GROUP =====
(1) cloudcephosd1021.eqiad.wmnet
----- OUTPUT of 'dpkg -l ceph-common' -----
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-==================================================================
ii  ceph-common    14.2.21-1    amd64        common utilities to mount and interact with a ceph storage cluster
===== NODE GROUP =====
(26) cloudcephmon[1001-1003].eqiad.wmnet,cloudcephosd[1001-1020,1022-1024].eqiad.wmnet
----- OUTPUT of 'dpkg -l ceph-common' -----
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version           Architecture Description
+++-==============-=================-============-==================================================================
ii  ceph-common    15.2.11-1~bpo10+1 amd64        common utilities to mount and interact with a ceph storage cluster
================

This is because the system was installed with debian bullseye instead of buster.

Event Timeline

aborrero triaged this task as Medium priority.Mon, Nov 22, 10:28 AM
aborrero created this task.
aborrero updated the task description. (Show Details)
aborrero removed a subscriber: nskaggs.

Change 740540 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] ceph: update default version to octopus

https://gerrit.wikimedia.org/r/740540

This does not seem related to the patch (puppet was already setting the repos), probably a missed upgrade in the upgrade process?

The server was recently installed. Where did the nautilus value came from if not from puppet?

Change 740540 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] ceph: update default version to octopus

https://gerrit.wikimedia.org/r/740540

The server was recently installed. Where did the nautilus value came from if not from puppet?

That's interesting, maybe some race condition with setting up the repos/installing the packages?
Did the next puppet run change the repos? If not that means that the repos were updated, but the package was not, pointing in that direction.

The server was recently installed. Where did the nautilus value came from if not from puppet?

That's interesting, maybe some race condition with setting up the repos/installing the packages?
Did the next puppet run change the repos? If not that means that the repos were updated, but the package was not, pointing in that direction.

What I suspect happened here is that somehow a hiera lookup failed (perhaps in the initial run? who knows) and the puppet manifest got built with the default value, installing the old packages.

Problem: the cloudcephosd1021 server was installed using debian bullseye rather than debian buster.

There are several potential ways to handle this situation, but after a chat with @dcaro via IRC, we decided the following:

  • we will keep the host in bullseye
  • we will add the ceph octopus packages for bullseye
  • the host will be updated to use ceph octopus bullseye
  • therefore, this will be our first osd running octopus/bullseye, and we will leave it that way until we decide to upgrade to bullseye the rest of the ceph farm.

Change 741113 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] aptrepo: add ceph packages in the octopus/bullseye combo.

https://gerrit.wikimedia.org/r/741113

aborrero renamed this task from cloudcephosd1021 is using an old ceph version to cloudcephosd1021 is using an old ceph version because its running debian bullseye instead of buster.Wed, Nov 24, 11:55 AM
aborrero updated the task description. (Show Details)

Change 741113 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] aptrepo: add ceph packages in the octopus/bullseye combo

https://gerrit.wikimedia.org/r/741113

Change 741870 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] aptrepo: fix duplicate update name

https://gerrit.wikimedia.org/r/741870

Change 741870 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] aptrepo: fix duplicate update name

https://gerrit.wikimedia.org/r/741870

Mentioned in SAL (#wikimedia-operations) [2021-11-25T12:14:38Z] <arturo> update repo bullseye-wikimedia/thirdparty/ceph-octopus (T296175)

Change 741883 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] ceph: common: support both buster & bullseye

https://gerrit.wikimedia.org/r/741883

Change 741883 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] ceph: common: support both buster & bullseye

https://gerrit.wikimedia.org/r/741883

There is some kind of problem or bug in the upstream repository. The bullseye version doesn't contain all ceph packages:

arturo@endurance:~ $ curl -s https://download.ceph.com/debian-octopus/dists/bullseye/main/binary-amd64/Packages | grep Package
Package: ceph-deploy
arturo@endurance:~ $ curl -s https://download.ceph.com/debian-octopus/dists/buster/main/binary-amd64/Packages | grep Package
Package: ceph
Package: ceph-base
Package: ceph-base-dbg
Package: ceph-common
Package: ceph-common-dbg
Package: ceph-deploy
[..]

Got confirmation on IRC #ceph @OFTC:

14:13 <arturo> is there any known problem with ceph .deb repositories for bullseye/octopus? It seems it doesn't contain all relevant packages
14:15 <Tamwyn> I don't think they ported the octopus builds to bullseye

I'm trying to file a bug report in https://tracker.ceph.com but it has a 1day account approval delay for spam prevention.

So, it turns out some folks in the community use proxmox repos for deb packages, example: http://download.proxmox.com/debian/ceph-octopus/dists/bullseye/main/binary-amd64/
Other folks use croit repos: https://mirror.croit.io/debian-octopus/dists/ (this one doesn't contain bullseye as of this writing).

@dcaro pointed out that we can't be sure when using vendor repos, because they often include their own patches and package modifications. We have no straightforward way of double-checking that whatwever the proxmox repos have is trully ceph upstream.

I talked with some folks @ debian. The official approach in debian is to skip ceph octopus, so they plan to do a nautilus -> pacific update, which is supported by ceph anyway.

Given there are is no clear alternative to the upstream ceph repo, my original plan remains: open a ticket in the ceph upstream tracker and ask them to build octopus for bullseye. My account hasn't been approved as of this writing.

Looking at the current bugs, it seems that there was some work to do so:
https://tracker.ceph.com/issues/50500

But for some reason they have not yet actually built them. I went ahead and created the bug for you given that the account creation is getting delayed:
https://tracker.ceph.com/issues/53411

Let's see how it goes.