Related to T308644 & T308677 (unstable ordering of drives causing installer and puppet problems), which have been delaying T279637 (bullseye upgrade of ms-* swift clusters), we thought it might be better to use /sys/block/DEVICE/queue/rotational (and the equivalent puppet fact disk_type) to identify the SSDs. Unfortunately, this can't currently be done, because The Dell PowerEdge R7{3,4}0* systems fail to tell the OS which disks are SSDs:
mvernon@cumin2002:~$ sudo cumin 'ms-be*' 'facter dmi.product.name ; grep -l 0 /sys/block/*/queue/rotational || true' 84 hosts will be targeted: ms-be[2028-2069].codfw.wmnet,ms-be[1028-1033,1035-1058,1060-1071].eqiad.wmnet Ok to proceed on 84 hosts? Enter the number of affected hosts to confirm or "q" to quit 84 ===== NODE GROUP ===== (14) ms-be[2051-2056].codfw.wmnet,ms-be[1051-1058].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- ProLiant DL380 Gen10 /sys/block/md0/queue/rotational /sys/block/md1/queue/rotational /sys/block/sda/queue/rotational /sys/block/sdb/queue/rotational ===== NODE GROUP ===== (25) ms-be[2057-2069].codfw.wmnet,ms-be[1060-1071].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R740xd2 ===== NODE GROUP ===== (14) ms-be[2044-2050].codfw.wmnet,ms-be[1044-1050].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R740xd ===== NODE GROUP ===== (8) ms-be[2040-2043].codfw.wmnet,ms-be[1040-1043].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R730xd ===== NODE GROUP ===== (23) ms-be[2028-2039].codfw.wmnet,ms-be[1028-1033,1035-1039].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- ProLiant DL380 Gen9 /sys/block/md0/queue/rotational /sys/block/md1/queue/rotational /sys/block/sda/queue/rotational /sys/block/sdb/queue/rotational ================
To try and see if there was a swift-specific problem, look at all the hardware of these types at WMF:
mvernon@cumin2002:~$ sudo cumin 'F:boardproductname = 072T6D or F:boardproductname = 0C2PJH or F:boardproductname = 01KPX8 or F:boardproductname = 0VNGN1' 'facter dmi.product.name ; grep -l 0 /sys/block/*/queue/rotational || true' 111 hosts will be targeted: an-presto[1001-1005].eqiad.wmnet,an-worker[1078-1095].eqiad.wmnet,analytics[1058-1077].eqiad.wmnet,backup[2003-2008].codfw.wmnet,backup[1003-1008].eqiad.wmnet,cloudstore[1008-1009].wikimedia.org,dumpsdata[1001-1002].eqiad.wmnet,kafka-jumbo[1001-1006].eqiad.wmnet,ms-be[2040-2046,2048-2050,2057-2069].codfw.wmnet,ms-be[1040-1045,1047-1050,1060-1071].eqiad.wmnet,stat1005.eqiad.wmnet Ok to proceed on 111 hosts? Enter the number of affected hosts to confirm or "q" to quit 111 ===== NODE GROUP ===== (1) stat1005.eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R730 ===== NODE GROUP ===== (12) ms-be[2044-2046,2048-2050].codfw.wmnet,ms-be[1044-1045,1047-1050].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R740xd ===== NODE GROUP ===== (27) backup2008.codfw.wmnet,backup1008.eqiad.wmnet,ms-be[2057-2069].codfw.wmnet,ms-be[1060-1071].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R740xd2 ===== NODE GROUP ===== (1) backup1007.eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R740xd2 /sys/block/dm-0/queue/rotational /sys/block/md0/queue/rotational /sys/block/md1/queue/rotational /sys/block/md2/queue/rotational /sys/block/sdb/queue/rotational /sys/block/sdc/queue/rotational ===== NODE GROUP ===== (9) backup[2003-2007].codfw.wmnet,backup[1003-1006].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R740xd2 /sys/block/dm-0/queue/rotational /sys/block/md0/queue/rotational /sys/block/md1/queue/rotational /sys/block/md2/queue/rotational /sys/block/sda/queue/rotational /sys/block/sdb/queue/rotational ===== NODE GROUP ===== (61) an-presto[1001-1005].eqiad.wmnet,an-worker[1078-1095].eqiad.wmnet,analytics[1058-1077].eqiad.wmnet,cloudstore[1008-1009].wikimedia.org,dumpsdata[1001-1002].eqiad.wmnet,kafka-jumbo[1001-1006].eqiad.wmnet,ms-be[2040-2043].codfw.wmnet,ms-be[1040-1043].eqiad.wmnet ----- OUTPUT of 'facter dmi.produ...tational || true' ----- PowerEdge R730xd ================
Most backup hosts are seen to have their SSDs correctly described to the OS. Inspecting the iDRAC for some of these systems shows that the SSDs are marked as non-RAID disks (rather than being used as single-member RAID-0 arrays); thus they are available to the system JBOD and so fully visible to the OS.
Indeed, the exceptional host backup2008 (where the SSDs are not marked as non-rotational) instead have them as single-member RAID-0 arrays. @jcrespo might be able to confirm whether this was an intended change of setup or not.
Some online docs (e.g. this RH article) suggest that for these RAID controllers the switch between RAID-0 and JBOD is lossless, which implies we could swap to that configuration for ms-systems, enabling us to move to using SSD vs not-SSD in installer/puppet. We currently have some pre-production ms nodes of the right hardware (e.g. ms-be2069) which would enable us to test this theory; although it's not clear if this migration could be done with any sort of automation or if instead each host would need its settings updating by hand...
Alternatively, I don't know if udev or the relevant SCSI drivers could be bullied into understanding the drives correctly; I'm not clear enough on the details to know if it's the RAID controller lying to the OS (since it knows the virtual disks are SSD) or the kernel misunderstanding what it's told.