As part of the RAID work (T84050 etc.), I added a fact that encodes all the RAID configured on every server, in a comma-separated list.
Auditing that list from servermon's fact query to include hosts where no RAID is configure and is_virtual equals to false, results in this:
- bast4001.wikimedia.org (T133699)
- chromium.wikimedia.org & hydrogen.wikimedia.org (T123727)
- labnet1002.eqiad.wmnet (T136718)
- lvs100.wikimedia.org, being decom'd in favor of new hardware in T184293 (T136737)
- maps-test200.codfw.wmnet (T140440) actually had RAID but was not detected
- mw* (T106381)
- osmium.eqiad.wmnet (T132530)
- rcs100.eqiad.wmnet (T140441)
- rdb100.eqiad.wmnet (T140442)
- snapshot100.eqiad.wmnet (T140439)
This list is quite troubling; it's also troubling that for many of those hosts, we do have second disks, but they were never configured. That's the case for e.g. rdb1005/rdb1006 or chromium:
root@chromium:~# fdisk -l /dev/sdb Disk /dev/sdb: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000081 Device Boot Start End Blocks Id System root@chromium:~#
All of the above should audited and reformatted to be using RAID, when needed.
The list is by no means exhaustive; there are other hosts that are reported as having a RAID controller, but perhaps they are configured as JBOD (or single-disk RAID0s). Hosts with just the "mpt" controller reported are especially susceptible to that (e.g. labnet1001 was not on the list above, but has no RAID configured).