Page MenuHomePhabricator

Some swift disks wrongly mounted on 5 ms-be hosts
Open, MediumPublic

Description

I noticed some disks in ms-be (HP) hosts are mounted to a directory that doesn't correspond to their linux name. On Dell this usually happens when a disk is broken and its LD missing, that disk letter is missing and the rest after that shifts one letter. This is also the reason why we use FS labels for swift disks so this "re-lettering" doesn't cause issues.
On HP the re-lettering isn't supposed to happen because the LD are decoupled from PD, IOW when removing a PD because it is broken the LD stays the same and there's no re-lettering in theory.
It might also be a timing issue with the hpssa driver or controller listing the drives when requested, or an issue when first installing the machines.

i7:~$ for i in $(seq -w 16 39) ; do ssh ms-be10$i.eqiad.wmnet -- 'for i in a3 b3 c1 d1 e1 f1 g1 h1 i1 j1 k1 l1 m1 n1; do df -h | grep -q "^/dev/sd$i.*/srv/swift-storage/sd$i$" || echo "$HOSTNAME $i mismounted" ; done'; done
ms-be1026 f1 mismounted
ms-be1026 g1 mismounted
ms-be1026 h1 mismounted
ms-be1035 d1 mismounted
ms-be1035 e1 mismounted
ms-be1035 f1 mismounted
ms-be1035 g1 mismounted
ms-be1035 h1 mismounted
ms-be1036 d1 mismounted
ms-be1036 e1 mismounted
ms-be1036 f1 mismounted
ms-be1036 g1 mismounted
i7:~$ for i in $(seq -w 16 39) ; do ssh ms-be20$i.codfw.wmnet -- 'for i in a3 b3 c1 d1 e1 f1 g1 h1 i1 j1 k1 l1 m1 n1; do df -h | grep -q "^/dev/sd$i.*/srv/swift-storage/sd$i$" || echo "$HOSTNAME $i mismounted" ; done'; done
ms-be2026 c1 mismounted
ms-be2026 d1 mismounted
ms-be2026 e1 mismounted
ms-be2026 f1 mismounted
ms-be2026 g1 mismounted
ms-be2026 h1 mismounted
ms-be2026 i1 mismounted
ms-be2026 j1 mismounted
ms-be2026 k1 mismounted
ms-be2026 l1 mismounted
ms-be2037 c1 mismounted
ms-be2037 d1 mismounted
ms-be2037 e1 mismounted
ms-be2037 f1 mismounted
ms-be2037 g1 mismounted
ms-be2037 h1 mismounted
ms-be2037 i1 mismounted

Details

Related Gerrit Patches:
operations/puppet : productionswift: use implicit /dev/swift prefix for swift devices
operations/puppet : productionswift: ship udev rules for swift disks
operations/puppet : productionprofile: fix labs swift storage

Event Timeline

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptApr 24 2017, 10:35 AM
fgiunchedi moved this task from Backlog to Doing on the User-fgiunchedi board.Apr 24 2017, 11:22 AM

I've tried rebooting ms-be1036 though that didn't change anything, I think the issue is a combination of these factors:

  1. Hardware is first installed and pxe-booted, partman runs and creates partitions as instructed (sda/sdb raided, no other disks are touched)
  2. HW raid isn't configured properly, so that it might happen sda/sdb are not the SSDs but rotating disks instead
  3. HW raid is fixed and the machine reimaged via wmf-reimage
  4. Machine pxe-boots, partman runs and creates again partitions/raid on sda/sdb (sdc now is already partitioned, but shouldn't, and it is left alone)
  5. Machine reboots and puppet runs for the first time, partitions on swift data disks are created and filesystems labeled

I have a hunch the last step is when things go wrong, once the filesystems are created by puppet and labeled they won't be touched again, at the next reboot then the disk order changes and the issue is present.

@fgiunchedi Is there anything I can do to help with this?

@Cmjohnson not ATM, initially I thought it was a HW raid config issue but doesn't look like it, thanks!

Mentioned in SAL (#wikimedia-operations) [2017-05-02T13:03:34Z] <godog> rebuild mismounted FSes on ms-be1036 - T163673

Mentioned in SAL (#wikimedia-operations) [2017-05-03T09:05:21Z] <godog> rebuild mismounted FSes on ms-be1035 - T163673

The same thing happened on ms-be2037, it looks like the non-deterministic order of scsi devices when the kernel boots. Sometimes the LDs might not be detected in order wrt their scsi ID

[   12.354398] sd 0:1:0:0: [sda] 937637552 512-byte logical blocks: (480 GB/447 GiB)
[   12.354401] sd 0:1:0:0: [sda] 4096-byte physical blocks
[   12.354589] sd 0:1:0:0: [sda] Write Protect is off
[   12.354592] sd 0:1:0:0: [sda] Mode Sense: 73 00 00 08
[   12.354604] sd 0:1:0:8: [sdb] 7813971632 512-byte logical blocks: (4.00 TB/3.64 TiB)
[   12.354616] sd 0:1:0:1: [sdc] 937637552 512-byte logical blocks: (480 GB/447 GiB)
[   12.354618] sd 0:1:0:1: [sdc] 4096-byte physical blocks
[   12.354669] sd 0:1:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   12.354716] sd 0:1:0:2: [sdd] 7813971632 512-byte logical blocks: (4.00 TB/3.64 TiB)
[   12.354785] sd 0:1:0:8: [sdb] Write Protect is off
[   12.354787] sd 0:1:0:8: [sdb] Mode Sense: 73 00 00 08
[   12.354814] sd 0:1:0:3: [sde] 7813971632 512-byte logical blocks: (4.00 TB/3.64 TiB)
[   12.354815] sd 0:1:0:1: [sdc] Write Protect is off
[   12.354820] sd 0:1:0:1: [sdc] Mode Sense: 73 00 00 08
[   12.354827] sd 0:1:0:4: [sdf] 7813971632 512-byte logical blocks: (4.00 TB/3.64 TiB)
[   12.354887] sd 0:1:0:8: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   12.354909] sd 0:1:0:5: [sdg] 7813971632 512-byte logical blocks: (4.00 TB/3.64 TiB)
[   12.354924] sd 0:1:0:1: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   12.354943] sd 0:1:0:2: [sdd] Write Protect is off
[   12.354946] sd 0:1:0:2: [sdd] Mode Sense: 73 00 00 08
[   12.354968] sd 0:1:0:6: [sdh] 7813971632 512-byte logical blocks: (4.00 TB/3.64 TiB)
[   12.355060] sd 0:1:0:2: [sdd] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   12.355100] sd 0:1:0:3: [sde] Write Protect is off
[   12.355105] sd 0:1:0:3: [sde] Mode Sense: 73 00 00 08
[   12.355106] sd 0:1:0:4: [sdf] Write Protect is off
[   12.355111] sd 0:1:0:4: [sdf] Mode Sense: 73 00 00 08
[   12.355152] sd 0:1:0:5: [sdg] Write Protect is off
[   12.355154] sd 0:1:0:5: [sdg] Mode Sense: 73 00 00 08
[   12.355192] sd 0:1:0:7: [sdi] 7813971632 512-byte logical blocks: (4.00 TB/3.64 TiB)
[   12.355216] sd 0:1:0:3: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   12.355225] sd 0:1:0:4: [sdf] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   12.355238] sd 0:1:0:5: [sdg] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   12.355399] sd 0:1:0:6: [sdh] Write Protect is off
[   12.355402] sd 0:1:0:6: [sdh] Mode Sense: 73 00 00 08
[   12.355416] sd 0:1:0:7: [sdi] Write Protect is off
[   12.355419] sd 0:1:0:7: [sdi] Mode Sense: 73 00 00 08
[   12.355488] sd 0:1:0:6: [sdh] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   12.355508] sd 0:1:0:7: [sdi] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
ls -la /dev/disk/by-path/ | grep -v -- -part | sort -k11
drwxr-xr-x 2 root root 720 May 25 09:17 .
drwxr-xr-x 8 root root 160 May 25 09:17 ..
total 0
lrwxrwxrwx 1 root root   9 May 25 10:13 pci-0000:08:00.0-scsi-0:1:0:0 -> ../../sda
lrwxrwxrwx 1 root root   9 May 25 10:13 pci-0000:08:00.0-scsi-0:1:0:8 -> ../../sdb
lrwxrwxrwx 1 root root   9 May 25 09:18 pci-0000:08:00.0-scsi-0:1:0:1 -> ../../sdc
lrwxrwxrwx 1 root root   9 May 25 10:13 pci-0000:08:00.0-scsi-0:1:0:2 -> ../../sdd
lrwxrwxrwx 1 root root   9 May 25 09:18 pci-0000:08:00.0-scsi-0:1:0:3 -> ../../sde
lrwxrwxrwx 1 root root   9 May 25 10:13 pci-0000:08:00.0-scsi-0:1:0:4 -> ../../sdf
lrwxrwxrwx 1 root root   9 May 25 10:13 pci-0000:08:00.0-scsi-0:1:0:5 -> ../../sdg
lrwxrwxrwx 1 root root   9 May 25 10:13 pci-0000:08:00.0-scsi-0:1:0:6 -> ../../sdh
lrwxrwxrwx 1 root root   9 May 25 10:13 pci-0000:08:00.0-scsi-0:1:0:7 -> ../../sdi
lrwxrwxrwx 1 root root   9 May 25 09:18 pci-0000:08:00.0-scsi-0:1:0:9 -> ../../sdj
lrwxrwxrwx 1 root root   9 May 25 09:18 pci-0000:08:00.0-scsi-0:1:0:10 -> ../../sdk
lrwxrwxrwx 1 root root   9 May 25 09:18 pci-0000:08:00.0-scsi-0:1:0:11 -> ../../sdl
lrwxrwxrwx 1 root root   9 May 25 09:18 pci-0000:08:00.0-scsi-0:1:0:12 -> ../../sdm
lrwxrwxrwx 1 root root   9 May 25 10:13 pci-0000:08:00.0-scsi-0:1:0:13 -> ../../sdn

The scsi LUN numbers seem to be always correct, I'm assuming because they don't change if the underlying LV isn't changed. afaics hpssa will yield the devices in whichever order they are discovered and not in LUN order. udev can't change names of devices, so we'll have to resort to additional symlinks I think

I tried working out some udev rules to fix the name based on scsi LUN and symlink to swift-<device> or swift/<device> e.g.

ENV{ID_PATH_TAG}=="pci-0000_08_00_0-scsi-0_1_0_0", SYMLINK+="swift-sda%n"

And we'd need to adjust the puppet code that deals with formatting swift partitions (e.g swift::init_device)

Change 361647 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: ship udev rules for swift disks

https://gerrit.wikimedia.org/r/361647

Change 361648 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] swift: use implicit /dev/swift prefix for swift devices

https://gerrit.wikimedia.org/r/361648

Change 361665 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] profile: update labs swift storage symlink

https://gerrit.wikimedia.org/r/361665

Change 361665 merged by Filippo Giunchedi:
[operations/puppet@production] profile: fix labs swift storage

https://gerrit.wikimedia.org/r/361665

Change 361647 merged by Filippo Giunchedi:
[operations/puppet@production] swift: ship udev rules for swift disks

https://gerrit.wikimedia.org/r/361647

fgiunchedi moved this task from Doing to Backlog on the User-fgiunchedi board.Aug 22 2017, 1:06 PM

Change 361648 had a related patch set uploaded (by Alex Monk; owner: Filippo Giunchedi):
[operations/puppet@production] swift: use implicit /dev/swift prefix for swift devices

https://gerrit.wikimedia.org/r/361648

ayounsi removed a subscriber: ayounsi.Sep 25 2019, 6:22 PM