Page MenuHomePhabricator

Follow-up: Degraded Disk Not Yet Added to RAID (an-worker1175, an-worker1199)
Open, Needs TriagePublic

Description

a follow-up, the disks on an-worker1175 T396703 and an-worker1199 T409060 are still in a degraded state and have not yet been added to the RAID.

The previous ticket was closed before the remaining steps could be completed. This ticket is intended to track the final RAID work and ensure both systems are fully restored to a healthy state.

Details

Other Assignee
BTullis

Event Timeline

Just took care of an-worker1175:

root@an-worker1175:~# perccli64 /c0 add vd r0 drives=252:6
CLI Version = 007.1910.0000.0000 Oct 08, 2021
Operating system = Linux 5.10.0-37-amd64
Controller = 0
Status = Success
Description = Add VD Succeeded.


root@an-worker1175:~# lsblk
NAME                           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                              8:0    0   7.3T  0 disk
└─sda1                           8:1    0   7.3T  0 part /var/lib/hadoop/data/l
sdb                              8:16   0   7.3T  0 disk
└─sdb1                           8:17   0   7.3T  0 part /var/lib/hadoop/data/j
sdc                              8:32   0   7.3T  0 disk
└─sdc1                           8:33   0   7.3T  0 part /var/lib/hadoop/data/i
sdd                              8:48   0   7.3T  0 disk
└─sdd1                           8:49   0   7.3T  0 part /var/lib/hadoop/data/h
sde                              8:64   0   7.3T  0 disk
└─sde1                           8:65   0   7.3T  0 part /var/lib/hadoop/data/g
sdf                              8:80   0   7.3T  0 disk
└─sdf1                           8:81   0   7.3T  0 part /var/lib/hadoop/data/e
sdg                              8:96   0   7.3T  0 disk
└─sdg1                           8:97   0   7.3T  0 part /var/lib/hadoop/data/f
sdh                              8:112  0   7.3T  0 disk
└─sdh1                           8:113  0   7.3T  0 part /var/lib/hadoop/data/d
sdi                              8:128  0   7.3T  0 disk
└─sdi1                           8:129  0   7.3T  0 part /var/lib/hadoop/data/c
sdj                              8:144  0   7.3T  0 disk
└─sdj1                           8:145  0   7.3T  0 part /var/lib/hadoop/data/b
sdk                              8:160  0   7.3T  0 disk
└─sdk1                           8:161  0   7.3T  0 part /var/lib/hadoop/data/a
sdl                              8:176  0 446.6G  0 disk
├─sdl1                           8:177  0   953M  0 part /boot
├─sdl2                           8:178  0     1K  0 part
└─sdl5                           8:181  0 445.7G  0 part
  ├─an--worker1175--vg-swap    254:0    0   9.3G  0 lvm  [SWAP]
  ├─an--worker1175--vg-root    254:1    0  55.9G  0 lvm  /
  └─an--worker1175--vg-journalnode
                               254:2    0    10G  0 lvm  /var/lib/hadoop/journal
sdm                              8:192  0   7.3T  0 disk
root@an-worker1175:~# parted /dev/sdm --script mklabel gpt
root@an-worker1175:~# parted /dev/sdm --script mkpart primary ext4 0% 100%
root@an-worker1175:~# mkfs.ext4 -L hadoop-k /dev/sdm1
mke2fs 1.46.2 (28-Feb-2021)
Creating filesystem with 1953365504 4k blocks and 244170752 inodes
Filesystem UUID: a46774af-1b04-4cb4-90c6-a2b898ba3f43
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848, 512000000, 550731776, 644972544, 1934917632

Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done

root@an-worker1175:~# tune2fs -m 0 /dev/sdm1
tune2fs 1.46.2 (28-Feb-2021)
Setting reserved blocks percentage to 0% (0 blocks)
root@an-worker1175:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# systemd generates mount units based on this file, see systemd.mount(5).
# Please run 'systemctl daemon-reload' after making changes here.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/an--worker1175--vg-root /               ext4    errors=remount-ro 0       1
# /boot was on /dev/sda1 during installation
UUID=65e3f98a-4226-4a8d-9920-b1dee5acc1d3 /boot           ext4    defaults        0       2
/dev/mapper/an--worker1175--vg-swap none            swap    sw              0       0
# Hadoop DataNode partition b
# Hadoop JournalNode partition
/dev/an-worker1175-vg/journalnode       /var/lib/hadoop/journal ext4    defaults,noatime        0       2
# Hadoop DataNode partition a
LABEL=hadoop-a  /var/lib/hadoop/data/a  ext4    defaults,noatime        0       2
# Hadoop DataNode partition b
LABEL=hadoop-b  /var/lib/hadoop/data/b  ext4    defaults,noatime        0       2
# Hadoop DataNode partition c
LABEL=hadoop-c  /var/lib/hadoop/data/c  ext4    defaults,noatime        0       2
# Hadoop DataNode partition d
LABEL=hadoop-d  /var/lib/hadoop/data/d  ext4    defaults,noatime        0       2
# Hadoop DataNode partition e
LABEL=hadoop-e  /var/lib/hadoop/data/e  ext4    defaults,noatime        0       2
# Hadoop DataNode partition f
LABEL=hadoop-f  /var/lib/hadoop/data/f  ext4    defaults,noatime        0       2
# Hadoop DataNode partition g
LABEL=hadoop-g  /var/lib/hadoop/data/g  ext4    defaults,noatime        0       2
# Hadoop DataNode partition h
LABEL=hadoop-h  /var/lib/hadoop/data/h  ext4    defaults,noatime        0       2
# Hadoop DataNode partition i
LABEL=hadoop-i  /var/lib/hadoop/data/i  ext4    defaults,noatime        0       2
# Hadoop DataNode partition j
LABEL=hadoop-j  /var/lib/hadoop/data/j  ext4    defaults,noatime        0       2
# Hadoop DataNode partition k
LABEL=hadoop-k  /var/lib/hadoop/data/k  ext4    defaults,noatime        0       2
# Hadoop DataNode partition l
LABEL=hadoop-l  /var/lib/hadoop/data/l  ext4    defaults,noatime        0       2
root@an-worker1175:~# mount -a
root@an-worker1175:~# findmnt | grep hadoop
├─/var/lib/hadoop/journal    /dev/mapper/an--worker1175--vg-journalnode ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/f     /dev/sdg1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/d     /dev/sdh1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/i     /dev/sdc1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/g     /dev/sde1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/b     /dev/sdj1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/e     /dev/sdf1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/c     /dev/sdi1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/j     /dev/sdb1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/a     /dev/sdk1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/l     /dev/sda1                                  ext4       rw,noatime,stripe=64
├─/var/lib/hadoop/data/h     /dev/sdd1                                  ext4       rw,noatime,stripe=64
└─/var/lib/hadoop/data/k     /dev/sdm1                                  ext4       rw,noatime,stripe=64

an-worker1199 still remains to be done; looks like it's waiting for another drive swap.

@RKemper drive has been swapped already just waiting on it to be readded T416066