Page MenuHomePhabricator

Swap RAID controller on ms-be1091.eqiad.wmnet
Closed, ResolvedPublic

Description

Hi!

Supermicro should have sent a test RAID controller, we'd like to test it on ms-be1091. The host is not serving production traffic, it can be shutdown and upgraded anytime.

Originally the controller was for ms-be2088 (see T384003), but Supermicro sent it to the wrong address :D

We'd need to do the following:

  1. Swap the old controller with the new one on ms-be1091. Theoretically the controller should be 100% compatible with cabling/PCI/etc..
  2. Once up, me and Matthew will confirm that the controller is up and running.
  3. After that, we'd like to check if disk hot swapping works as expected, so we'll need John or Valerie to remove/reinsert one of the disk bays to simulate a broken disk replacement.

Thanks in advance!

Event Timeline

@elukey would you like to shut it down or can we shutdown on our own?

@Jclark-ctr I downtimed the host for two days, please feel free to shut it down when it is convenient for you :)

Mentioned in SAL (#wikimedia-operations) [2025-04-15T07:28:06Z] <Emperor> make sure all disks are mounted correctly prior to disk-swap testing T391854

Mentioned in SAL (#wikimedia-operations) [2025-04-15T07:28:14Z] <Emperor> make sure all disks are mounted correctly prior to disk-swap testing T391854 ms-be1091

@elukey thanks for downtime raid card has been installed. @MatthewVernon All yours to verify

Thanks a lot!

I see the new controller but also some errors while mounting swift partitions:

[Tue Apr 15 13:41:35 2025] /dev/disk/by-path/pci-0000:98:00.0-scsi-0:2:10:0-part1: Can't open blockdev
98:00.0 Serial Attached SCSI controller: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx

@MatthewVernon we should probably reimage, what do you think?

Currently puppet is failing on this host:

mvernon@ms-be1091:~$ sudo run-puppet-agent
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Resource Statement, Duplicate declaration: Exec[mountpoint-root-/srv/swift-storage/objects98] is already declared at (file: /srv/puppet_code/environments/production/modules/swift/manifests/mount_filesystem.pp, line: 21); cannot redeclare (file: /srv/puppet_code/environments/production/modules/swift/manifests/mount_filesystem.pp, line: 21) (file: /srv/puppet_code/environments/production/modules/swift/manifests/mount_filesystem.pp, line: 21, column: 5) (file: /srv/puppet_code/environments/production/modules/profile/manifests/swift/storage/configure_disks.pp, line: 53) on node ms-be1091.eqiad.wmnet
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

hazarding a guess, there's something wrong/changed about the storage path - you now have e.g. pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy0-lun-0 rather than pci-0000:98:00.0-scsi-0:2:0:0

@elukey that might help, yes, it looks like puppet finds the disks, but they've changed their path:

swift_disks => {
  accounts => [
    "pci-0000:00:11.5-ata-1.0-part4",
    "pci-0000:00:11.5-ata-2.0-part4"
  ],
  container => [
    "pci-0000:00:11.5-ata-1.0-part5",
    "pci-0000:00:11.5-ata-2.0-part5"
  ],
  objects => [
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy0-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy1-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy10-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy11-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy2-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy3-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy4-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy5-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy6-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy7-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy8-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy9-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy0-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy1-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy10-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy11-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy2-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy3-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy4-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy5-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy6-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy7-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy8-lun-0",
    "pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy9-lun-0"
  ]
}

(I don't know whether everything will Just Work with a reimage, or if some awful regexes will need adjusting)

Tried to reimage but indeed it fails for the swift facts being inconsistent. We'll need to fix them :(

I sorted out some issue with the host itself, Redfish was locked down since the Supermicro BMC detected an "intrusion" when John added the new controller (the sensor status needed to be cleared etc..).

@MatthewVernon Do you have time to fix the swift_facts? I am available to review the change, but I fear I don't have a lot of context on it.

I think swift_facts is broadly correct, the problem is in configure_disks.pp:

$facts['swift_disks']['objects'].each |$drive| {
    # disk is of the form pci-0000:3b:00.0-scsi-0:0:1:0
    $idx = $drive.split(/:/)[-2]
    $device_path = "/dev/disk/by-path/${drive}"
    $partition_path = "${device_path}-part1"
    $swift_path = "${swift_storage_dir}${drive}-part1"

The problem is that these disks instead look like pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy9-lun-0, which if you do the $idx operation on, always returns 98.
[I'm not likely to have time to look at this in the next few-working-days I'm afraid, but I can add it to my stack of overdue tasks]

We solved similar elsewhere in that file (where SM and Dell were a bit different) with a regex, but I'm not sure how doable that is here given the widely different path entries :(

Change #1137243 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::swift::storage: allow non-scsi id matches for object partitions

https://gerrit.wikimedia.org/r/1137243

I started with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1137243 but it doesn't work due to the fact that we have two "exp" in the list:

"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy0-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy1-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy10-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy11-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy2-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy3-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy4-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy5-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy6-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy7-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy8-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ff9b73f-phy9-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy0-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy1-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy10-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy11-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy2-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy3-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy4-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy5-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy6-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy7-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy8-lun-0",
"pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy9-lun-0"

That are exp0x500304801ffa4e3f and exp0x500304801ff9b73f, with id from 0 to 9 each (so they are not unique).

Thank you so much for looking at this! Happy to do review once you've something working

I started with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1137243 but it doesn't work due to the fact that we have two "exp" in the list:
That are exp0x500304801ffa4e3f and exp0x500304801ff9b73f, with id from 0 to 9 each (so they are not unique).

Ugh, that's pretty unpleasant :(

@MatthewVernon: I am not sure if we have to rely on the current objectX (with X==Integer) format for the /srv/swift-storage dirs, but a quick solution would be to allow idx to something like 500304801ff9b73f-phy0 or similar (so full path /srv/swift-storage/object_500304801ff9b73f-phy0). With the assumption that the naming scheme is stable, but IIUC the controller offers two 12x connections to disks, and the exp0x value is only representing it.

Otherwise we can try to serialize the scheme to intergers from 0-23 via puppet (but it will be a little messy), or attempt something via udev rules (but I have to admit my ignorance). At this point I am not sure what's best, lemme know your preference.

So profile::swift::storage::configure_disks has to be able to make suitable swift::mount_filesystem resources for each drive (which I think setting $idx to something like you suggest would work for), and then the swift ring manager needs to know what directory entries under /srv/swift-storage/ to put into the rings. Those are described in YAML, but as long as they're reasonably stable (i.e. we'd expect each host with one of these controllers to have exp0x500304801ffa4e3f and exp0x500304801ff9b73f rather than each host being a different thing) I don't see any problem with that, we'd just make a new storage scheme and it should work...

I don't know if you can muck around with /dev/disk/by-path/ entries with udev rules - if you could and replace the current rather horrible ones with something that just gave us 0-23 again that would be much easier, but I don't know how doable that is.

@MatthewVernon we could go for something like https://gerrit.wikimedia.org/r/c/operations/puppet/+/1137243, test ms-be1091 and decide what we want to do regarding the controller (buy more, keep the other one). If we decide to keep it, then we can tune/refine the current patch (so we unblock testing without unnecessary headaches). What do you think?

Change #1137243 merged by Elukey:

[operations/puppet@production] profile::swift::storage: allow non-scsi id matches for object partitions

https://gerrit.wikimedia.org/r/1137243

@Jclark-ctr hi! When you have a moment let's do the fake hot swap test, it should be sufficient to just pull any of the hot-swappable disk bays out and in again in theory.

Current status from storcli:

PD LIST :
=======

---------------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                Sp 
---------------------------------------------------------------------------
12:0      0 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:1      1 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:2      2 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:3      3 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:4      4 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:5      5 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:6      6 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:7      7 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:8      8 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:9      9 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:10    10 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:11    11 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:0     13 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:1     14 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:2     15 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:3     16 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:4     17 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:5     18 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:6     19 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:7     20 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:8     21 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:9     22 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:10    23 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:11    24 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
---------------------------------------------------------------------------
elukey triaged this task as Medium priority.Apr 28 2025, 2:30 PM

Today John helped me test the hot-swap behavior, and everything seems working way more nicely.

  1. John swapped one disk with a completely new one. The new disk was recognized correctly and the device was available via /dev/sdg (without any partition of course).
PD LIST :
=======

---------------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                Sp 
---------------------------------------------------------------------------
12:0      0 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:1      1 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:2      2 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:3      3 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:4      4 JBOD  -  7.277 TB SATA HDD -   -  512B ST8000VN004-3CP101   -     <=============================================
12:5      5 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:6      6 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:7      7 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:8      8 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:9      9 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:10    10 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:11    11 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:0     13 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:1     14 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:2     15 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:3     16 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:4     17 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:5     18 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:6     19 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:7     20 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:8     21 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:9     22 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:10    23 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:11    24 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
---------------------------------------------------------------------------


[Tue Apr 29 15:18:20 2025] sd 0:0:4:0: device_block, handle(0x001c)
[Tue Apr 29 15:18:24 2025] sd 0:0:4:0: device_unblock and setting to running, handle(0x001c)
[Tue Apr 29 15:18:24 2025] sd 0:0:4:0: [sdg] Synchronizing SCSI cache
[Tue Apr 29 15:18:24 2025] sd 0:0:4:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[Tue Apr 29 15:18:24 2025] XFS (sdg1): Unmounting Filesystem
[Tue Apr 29 15:18:24 2025] XFS (sdg1): log I/O error -5
[Tue Apr 29 15:18:24 2025] XFS (sdg1): xfs_do_force_shutdown(0x2) called from line 1211 of file fs/xfs/xfs_log.c. Return address = 00000000c149996a
[Tue Apr 29 15:18:24 2025] XFS (sdg1): Log I/O Error Detected. Shutting down filesystem
[Tue Apr 29 15:18:24 2025] XFS (sdg1): Unable to update superblock counters. Freespace may not be correct on next mount.
[Tue Apr 29 15:18:24 2025] mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x500304801ffa4e04)
[Tue Apr 29 15:18:24 2025] mpt3sas_cm0: removing handle(0x001c), sas_addr(0x500304801ffa4e04)
[Tue Apr 29 15:18:24 2025] mpt3sas_cm0: enclosure logical id(0x500304801ffa4e3f), slot(4)
[Tue Apr 29 15:18:24 2025] mpt3sas_cm0: enclosure level(0x0000), connector name( C0.1)
[Tue Apr 29 15:18:24 2025] XFS (sdg1): Please unmount the filesystem and rectify the problem(s)
[Tue Apr 29 15:18:20 2025] sd 0:0:4:0: device_block, handle(0x001c)
[Tue Apr 29 15:18:24 2025] sd 0:0:4:0: device_unblock and setting to running, handle(0x001c)
[Tue Apr 29 15:18:24 2025] sd 0:0:4:0: [sdg] Synchronizing SCSI cache
[Tue Apr 29 15:18:24 2025] sd 0:0:4:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[Tue Apr 29 15:18:24 2025] XFS (sdg1): Unmounting Filesystem
[Tue Apr 29 15:18:24 2025] XFS (sdg1): log I/O error -5
[Tue Apr 29 15:18:24 2025] XFS (sdg1): xfs_do_force_shutdown(0x2) called from line 1211 of file fs/xfs/xfs_log.c. Return address = 00000000c149996a
[Tue Apr 29 15:18:24 2025] XFS (sdg1): Log I/O Error Detected. Shutting down filesystem
[Tue Apr 29 15:18:24 2025] XFS (sdg1): Unable to update superblock counters. Freespace may not be correct on next mount.
[Tue Apr 29 15:18:24 2025] mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x500304801ffa4e04)
[Tue Apr 29 15:18:24 2025] mpt3sas_cm0: removing handle(0x001c), sas_addr(0x500304801ffa4e04)
[Tue Apr 29 15:18:24 2025] mpt3sas_cm0: enclosure logical id(0x500304801ffa4e3f), slot(4)
[Tue Apr 29 15:18:24 2025] mpt3sas_cm0: enclosure level(0x0000), connector name( C0.1)
[Tue Apr 29 15:18:24 2025] XFS (sdg1): Please unmount the filesystem and rectify the problem(s)
[Tue Apr 29 15:20:48 2025] mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05)
[Tue Apr 29 15:20:48 2025] scsi 0:0:26:0: Direct-Access     ATA      ST8000VN004-3CP1 SC60 PQ: 0 ANSI: 6
[Tue Apr 29 15:20:48 2025] scsi 0:0:26:0: SATA: handle(0x001c), sas_addr(0x500304801ffa4e04), phy(4), device_name(0x0000000000000000)
[Tue Apr 29 15:20:48 2025] scsi 0:0:26:0: enclosure logical id (0x500304801ffa4e3f), slot(4) 
[Tue Apr 29 15:20:48 2025] scsi 0:0:26:0: enclosure level(0x0000), connector name( C0.1)
[Tue Apr 29 15:20:48 2025] scsi 0:0:26:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[Tue Apr 29 15:20:48 2025] scsi 0:0:26:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
[Tue Apr 29 15:20:48 2025] sd 0:0:26:0: Power-on or device reset occurred
[Tue Apr 29 15:20:48 2025] sd 0:0:26:0: Attached scsi generic sg6 type 0
[Tue Apr 29 15:20:48 2025]  end_device-0:0:14: add: handle(0x001c), sas_addr(0x500304801ffa4e04)
[Tue Apr 29 15:20:48 2025] sd 0:0:26:0: [sdg] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
[Tue Apr 29 15:20:48 2025] sd 0:0:26:0: [sdg] 4096-byte physical blocks
[Tue Apr 29 15:20:48 2025] sd 0:0:26:0: [sdg] Write Protect is off
[Tue Apr 29 15:20:48 2025] sd 0:0:26:0: [sdg] Mode Sense: 6b 00 10 08
[Tue Apr 29 15:20:48 2025] sd 0:0:26:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
[Tue Apr 29 15:20:48 2025] sdg: detected capacity change from 0 to 8001563222016
[Tue Apr 29 15:20:48 2025] sdg: detected capacity change from 0 to 8001563222016
[Tue Apr 29 15:20:48 2025] sdg: detected capacity change from 0 to 8001563222016
[Tue Apr 29 15:20:48 2025] sd 0:0:26:0: [sdg] Attached SCSI disk
  1. John re-added the old disk back. The device was updated correctly, /dev/sdg1 became visible again, and I was able to mount -a and get back all the 24 partitions.
PD LIST :
=======

---------------------------------------------------------------------------
EID:Slt DID State DG     Size Intf Med SED PI SeSz Model                Sp 
---------------------------------------------------------------------------
12:0      0 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:1      1 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:2      2 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:3      3 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:4      4 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -    <==========================================================
12:5      5 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:6      6 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:7      7 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:8      8 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:9      9 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:10    10 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
12:11    11 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:0     13 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:1     14 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:2     15 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:3     16 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:4     17 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:5     18 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:6     19 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:7     20 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:8     21 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:9     22 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:10    23 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
25:11    24 JBOD  -  7.277 TB SATA HDD -   -  512B HGST HUS728T8TALE6L4 -  
---------------------------------------------------------------------------


[Tue Apr 29 15:25:22 2025] /dev/disk/by-path/pci-0000:98:00.0-sas-exp0x500304801ffa4e3f-phy4-lun-0-part1: Can't open blockdev
[Tue Apr 29 15:26:36 2025] sd 0:0:26:0: device_block, handle(0x001c)
[Tue Apr 29 15:26:39 2025] sd 0:0:26:0: device_unblock and setting to running, handle(0x001c)
[Tue Apr 29 15:26:39 2025] sd 0:0:26:0: [sdg] Synchronizing SCSI cache
[Tue Apr 29 15:26:39 2025] sd 0:0:26:0: [sdg] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[Tue Apr 29 15:26:39 2025] mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x500304801ffa4e04)
[Tue Apr 29 15:26:39 2025] mpt3sas_cm0: removing handle(0x001c), sas_addr(0x500304801ffa4e04)
[Tue Apr 29 15:26:39 2025] mpt3sas_cm0: enclosure logical id(0x500304801ffa4e3f), slot(4)
[Tue Apr 29 15:26:39 2025] mpt3sas_cm0: enclosure level(0x0000), connector name( C0.1)
[Tue Apr 29 15:28:02 2025] mpt3sas_cm0: log_info(0x31110e05): originator(PL), code(0x11), sub_code(0x0e05)
[Tue Apr 29 15:28:02 2025] scsi 0:0:27:0: Direct-Access     ATA      HGST HUS728T8TAL W9U0 PQ: 0 ANSI: 6
[Tue Apr 29 15:28:02 2025] scsi 0:0:27:0: SATA: handle(0x001c), sas_addr(0x500304801ffa4e04), phy(4), device_name(0x0000000000000000)
[Tue Apr 29 15:28:02 2025] scsi 0:0:27:0: enclosure logical id (0x500304801ffa4e3f), slot(4) 
[Tue Apr 29 15:28:02 2025] scsi 0:0:27:0: enclosure level(0x0000), connector name( C0.1)
[Tue Apr 29 15:28:02 2025] scsi 0:0:27:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[Tue Apr 29 15:28:02 2025] scsi 0:0:27:0: qdepth(32), tagged(1), scsi_level(7), cmd_que(1)
[Tue Apr 29 15:28:02 2025] sd 0:0:27:0: Power-on or device reset occurred
[Tue Apr 29 15:28:02 2025] sd 0:0:27:0: Attached scsi generic sg6 type 0
[Tue Apr 29 15:28:02 2025]  end_device-0:0:15: add: handle(0x001c), sas_addr(0x500304801ffa4e04)
[Tue Apr 29 15:28:02 2025] sd 0:0:27:0: [sdg] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
[Tue Apr 29 15:28:02 2025] sd 0:0:27:0: [sdg] 4096-byte physical blocks
[Tue Apr 29 15:28:02 2025] sd 0:0:27:0: [sdg] Write Protect is off
[Tue Apr 29 15:28:02 2025] sd 0:0:27:0: [sdg] Mode Sense: 6b 00 10 08
[Tue Apr 29 15:28:02 2025] sd 0:0:27:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
[Tue Apr 29 15:28:02 2025] sdg: detected capacity change from 0 to 8001563222016
[Tue Apr 29 15:28:02 2025] sdg: detected capacity change from 0 to 8001563222016
[Tue Apr 29 15:28:02 2025]  sdg: sdg1
[Tue Apr 29 15:28:02 2025] sdg: detected capacity change from 0 to 8001563222016
[Tue Apr 29 15:28:02 2025] sd 0:0:27:0: [sdg] Attached SCSI disk

@MatthewVernon This controller seems to work exactly as you wanted/preferred, lemme know if you want to do more tests.

Thanks, this all looks good to me (and I had a bit of a poke at ms-be1091 myself).

To summarise:

  • old-style controller can do hot-swap, via a strange start initialization, stop initialization, set jbod procedure (tested on ms-be2088 T384003)
  • new-style controller has hot-swapping that Just Works, albeit with uglier /dev/disk/by-path/ entries (tested on ms-be1091 T391854)
  • @MoritzMuehlenhoff has packaged storcli for deployment, but the relevant facts need updating so it actually gets installed (is there a task for this?)

Many thanks to everyone who has worked on this! I think the next steps are to look at costs - specifically, how much a SM Config-J price changes if we use these newer controllers; and how much it would cost to retro-fit them into the existing nodes (which I think is 2x thanos-be, 2x backup, 17x ms-be)...

  • @MoritzMuehlenhoff has packaged storcli for deployment, but the relevant facts need updating so it actually gets installed (is there a task for this?)

I've just created a task T393146

@MatthewVernon I think that the new controller costs the same as the old one, so the config-J price shouldn't change much. I'll follow up with @wiki_willy to confirm. If my assumption about the cost is right, what is your preference? Buy the new controller only for the new hosts, retro-fit all of them, or keep the current settings? We can start from your decision and then work on the next steps :)

It's about $250 for the RAID controllers, so we can definitely order those to replace the existing ones for Config J. To keep things consistent though, should we should order this RAID controller to replace the Config E and backup hosts also?

Thanks,
Willy

We decided to keep going with the new controller and retro-fit all the ms-be Supermicro nodes.