Page MenuHomePhabricator

cloudcephosd1021 has one misconfigured drive
Closed, ResolvedPublic

Description

All cloudcephosd* hosts have 10 hdds: 2 smaller drives for the OS (raid1) and 8 drives with a size of 1.7TB for Ceph OSD storage (more info on Wikitech).

Looking at the output of ceph osd tree I discovered that in cloudcephosd1021 one of the big drives is not currently in use, as only 7 osds are listed:

-33          12.22618      host cloudcephosd1021
160    ssd    1.74660          osd.160                up   1.00000  1.00000
161    ssd    1.74660          osd.161                up   1.00000  1.00000
162    ssd    1.74660          osd.162                up   1.00000  1.00000
163    ssd    1.74660          osd.163                up   1.00000  1.00000
164    ssd    1.74660          osd.164                up   1.00000  1.00000
165    ssd    1.74660          osd.165                up   1.00000  1.00000
166    ssd    1.74660          osd.166                up   1.00000  1.00000

Output of lsblk:

NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                                                                                                     8:0    0 447.1G  0 disk
├─sda1                                                                                                  8:1    0   285M  0 part
└─sda2                                                                                                  8:2    0 446.9G  0 part
  └─md0                                                                                                 9:0    0 446.7G  0 raid1
    ├─vg0-swap                                                                                        254:3    0   976M  0 lvm   [SWAP]
    ├─vg0-root                                                                                        254:4    0  74.5G  0 lvm   /
    └─vg0-srv                                                                                         254:5    0 281.9G  0 lvm   /srv
sdb                                                                                                     8:16   0 447.1G  0 disk
├─sdb1                                                                                                  8:17   0   285M  0 part
└─sdb2                                                                                                  8:18   0 446.9G  0 part
  └─md0                                                                                                 9:0    0 446.7G  0 raid1
    ├─vg0-swap                                                                                        254:3    0   976M  0 lvm   [SWAP]
    ├─vg0-root                                                                                        254:4    0  74.5G  0 lvm   /
    └─vg0-srv                                                                                         254:5    0 281.9G  0 lvm   /srv
sdc                                                                                                     8:32   0   1.7T  0 disk
├─sdc1                                                                                                  8:33   0   285M  0 part
└─sdc2                                                                                                  8:34   0   1.7T  0 part
sdd                                                                                                     8:48   0   1.7T  0 disk
└─ceph--4714314c--7a23--45a0--b062--f123ee984300-osd--block--9010c76a--5b2a--4606--9e42--413d227444fb 254:0    0   1.7T  0 lvm
sde                                                                                                     8:64   0   1.7T  0 disk
└─ceph--6b254e2b--2ef7--4c43--ae09--146fe9ba2ad1-osd--block--d02871c5--0e17--472c--a102--04fe79ba753a 254:1    0   1.7T  0 lvm
sdf                                                                                                     8:80   0   1.7T  0 disk
└─ceph--fff48f01--05f7--4964--b49e--bb212e92c2ec-osd--block--f9a1a900--e0c9--48f5--afe5--9a9a719da028 254:9    0   1.7T  0 lvm
sdg                                                                                                     8:96   0   1.7T  0 disk
└─ceph--d21349d5--bfe3--4c21--bd50--de242a237907-osd--block--e8eb7add--22e9--41c7--8c9b--a3d3a8ccd18c 254:6    0   1.7T  0 lvm
sdh                                                                                                     8:112  0   1.7T  0 disk
└─ceph--23a20189--3286--49c9--881b--0a15477e8917-osd--block--3b4be007--2903--401b--896d--d29133c4824d 254:2    0   1.7T  0 lvm
sdi                                                                                                     8:128  0   1.7T  0 disk
└─ceph--1f16a18f--c934--4f29--9ded--5d13a4bffe58-osd--block--935a630c--3a71--4215--b7bd--7a30629ff739 254:8    0   1.7T  0 lvm
sdj                                                                                                     8:144  0   1.7T  0 disk
└─ceph--77b1275c--542e--4276--b5ee--906394044833-osd--block--8cde6234--2843--4da2--848e--6694179ae9cf 254:7    0   1.7T  0 lvm

Output of fdisk -l /dev/sdc:

Disk /dev/sdc: 1.75 TiB, 1920383410176 bytes, 3750748848 sectors
Disk model: HFS1T9G32FEH-BA1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: ED71FEBE-5AFB-4D0D-92CC-0D0BA9707DF7

Device      Start        End    Sectors  Size Type
/dev/sdc1    2048     585727     583680  285M Linux filesystem
/dev/sdc2  585728 3750748159 3750162432  1.7T Linux RAID

Event Timeline

fnegri changed the task status from Open to In Progress.Oct 5 2022, 2:34 PM
fnegri triaged this task as Medium priority.

From a discussion in IRC #wikimedia-cloud-admin we think it's likely this was caused by the drives being in a different order after a reboot, and it was attempted to use the misconfigured drive as part of the RAID array.

[2022-10-05 15:55:59] <dcaro> not really, it looks as if it did create a raid on it, so probably the issue of drives changing names?
[2022-10-05 15:56:26] <andrewbogott> I definitely don't remember... I probably set things up with written-on-the-fly shell scripts so it could just a copy/paste mistake
[2022-10-05 15:56:48] — dcaro it was setup in 2021, october: https://sal.toolforge.org/production?p=0&q=cloudcephosd1021&d=
[2022-10-05 15:56:54] <dhinus> yep I also think it could have been caused by the drives being in the wrong order
[2022-10-05 15:57:19] <andrewbogott> yeah, could easily be that
[2022-10-05 15:57:25] — dcaro it seems it failed a bunch of times to reimage, that seems correlated to the drives changing order xd
[2022-10-05 15:57:32] <dhinus> :P
[2022-10-05 15:57:35] <taavi> was 1021 the one which accidentally got installed with bullseye and a wrong ceph version? or am I misremembering?
[2022-10-05 15:57:45] <dhinus> ah-ha :)
[2022-10-05 15:58:22] — dcaro look what I found https://phabricator.wikimedia.org/T302982
[2022-10-05 15:58:31] <dcaro> might be yes
[2022-10-05 15:58:58] <taavi> T296175
[2022-10-05 15:58:58] <stashbot> T296175: cloudcephosd1021 is using an old ceph version because its running debian bullseye instead of buster - https://phabricator.wikimedia.org/T296175

I think it should be safe to delete all partitions from the misconfigured drive, then run the cookbook wmcs.ceph.osd.bootstrap_and_add on the host.

Mentioned in SAL (#wikimedia-cloud-feed) [2022-10-05T14:40:04Z] <wm-bot2> Adding new OSDs ['cloudcephosd1021.eqiad.wmnet'] to the cluster (T319418) - cookbook ran by fran@wmf3169

Mentioned in SAL (#wikimedia-cloud-feed) [2022-10-05T14:40:08Z] <wm-bot2> Adding OSD cloudcephosd1021.eqiad.wmnet... (1/1) (T319418) - cookbook ran by fran@wmf3169

First cookbook run failed because of some temporary network issue:

Exception raised while executing cookbook wmcs.ceph.osd.bootstrap_and_add:
Traceback (most recent call last):
  File "/Users/fran/.virtualenvs/cookbooks/lib/python3.10/site-packages/spicerack/_menu.py", line 234, in run
    raw_ret = runner.run()
  File "/Users/fran/wmf/cookbooks/cookbooks/wmcs/ceph/osd/bootstrap_and_add.py", line 187, in run
    ).run()
  File "/Users/fran/wmf/cookbooks/cookbooks/wmcs/ceph/reboot_node.py", line 94, in run
    self.sallogger.log(message=f"Rebooting node {self.fqdn_to_reboot}")
  File "/Users/fran/wmf/cookbooks/cookbooks/wmcs/libs/common.py", line 614, in log
    my_socket.connect(sockaddr)
TimeoutError: [Errno 60] Operation timed out
END (FAIL) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=99)

Retrying, this time with --skip-reboot because I realised I don't really need to reboot the host.

The second run worked!

$ cookbook wmcs.ceph.osd.bootstrap_and_add --new-osd-fqdn cloudcephosd1021.eqiad.wmnet --task-id T319418 --skip-reboot

[...]

==> I'm going to destroy and create a new OSD on cloudcephosd1021.eqiad.wmnet:/dev/sdc.
Type "go" to proceed or "abort" to interrupt the execution
> go

[...]

Running command: /usr/bin/systemctl start ceph-osd@271
--> ceph-volume lvm activate successful for osd ID: 271
--> ceph-volume lvm create successful for: /dev/sdc

[...]

The new OSDs are up and running, the cluster will now start rebalancing the data to them, that might take quite a long time, you can follow the progress by running 'ceph status' on a control node.
END (PASS) - Cookbook wmcs.ceph.osd.bootstrap_and_add (exit_code=0)

The only downside is that we no longer have a nice and clean ordering of OSDs ids in the tree:

-33          13.97278      host cloudcephosd1021
160    ssd    1.74660          osd.160                up   1.00000  1.00000
161    ssd    1.74660          osd.161                up   1.00000  1.00000
162    ssd    1.74660          osd.162                up   1.00000  1.00000
163    ssd    1.74660          osd.163                up   1.00000  1.00000
164    ssd    1.74660          osd.164                up   1.00000  1.00000
165    ssd    1.74660          osd.165                up   1.00000  1.00000
166    ssd    1.74660          osd.166                up   1.00000  1.00000
271    ssd    1.74660          osd.271                up   1.00000  1.00000