Page MenuHomePhabricator

Supermicro's Config J hot swap behavior
Closed, ResolvedPublic

Description

In T382874 we had to replace a broken disk in ms-be1090, one of the new Supermicro Config J nodes. We were unable to hot-swap the disk, and after a chat between DP/Infra-Foundations/Dcops the following issues came up:

  1. The servers are high density, and they host two rows of disks (12 each) that are hot-swappable. One of the two rows requires the operator to slide the server forward to be able to extract the disk, and we cannot easily do it now due to short cables (fiber, power etc..). This means that a shutdown is needed if the broken disk is placed in the more internal/hidden row, because dcops needs to disconnect cables, slide forward, replace and do everything in reverse. This is not an ideal situation, so we should try to figure out if there are compromises/solutions. I had a chat with Valerie that mentioned an extra structure to fold the extra cabling required for sliding (to avoid them hanging and risking to be cut or hit by other servers), but for the moment it seems not a viable solution since it would require a longer/different rows where the server slides onto, that won't fit in our racks.
  1. Assuming that we can hot-swap in the DC, another issue arises: is the new disk recognized straight away by the OS without extra intervention? From T382874 it seems that after the power up the new disk was visible, but we need to test it. From various chats I gathered that the disk may end up being in a "Foreign" state (close to what happens with regular RAID scenarios), and something like megactl would be needed to set it to JBOD properly. I tested megactl and storecli in T377853#10457893 but neither of them worked, so the only alternative seems to be to set the JBOD value via BIOS (so a reboot would be required). We could try to test extracting/re-adding a disk in either ms-be2088 or ms-be1091 since we left them out of Swift production, to better understand the procedure. There may also be the option to replace the controllers with something that better supports JBOD (I wrote "may" since this is my assumption, I need to verify it) but this is something that Data Persistence needs to decide.

Event Timeline

FWIW, from my perspective is that we do need to be able to hot-swap these drives; if that turns out to mean we need different controllers in these SM Config-J systems, then let's do that. But maybe we don't, and this testing should hopefully answer that :)

@MatthewVernon @elukey i do agree with you all that "we do need to be able to hot-swap these drivers" and yes by design, all the drives on the Storage SuperServer SSG-620P-E1CR24H are hot-swap. We requested Super micro to match our Dell Config J so yes i can confirm that the disks are host-swap. The issue that we are facing here like @VRiley-WMF mentioned is physical. We can not get to the 12 top-loading drive bays without powering the server because the cabling doesn't allow us to do so. So my understanding here is if we are able to get long cables and manager well those cables, we should be able to get to the 12 top loading drive bays without an issue.

Please allow be some time to exam first this situation with @VRiley-WMF in eqiad and @Jhancock.wm in codfw to see what options we have to make it work before we even think about a new controller. if we do not fix the physical issue we are having with the server, having a new controller will not resolve the issue.

Thanks

Thanks for the update @Papaul , and of course you can have some time to look at the cable management issues. Do keep us posted, please :)

Supermicro came back with some nice suggestions to clear the state of a new/replaced disk (if it gets into something like Foreign state). The most promising part, in my opinion, is to do it via Redfish, that could be done easily via cookbook (if it works). I am going to open a task to dcops to setup a fake disk replacement on ms-be2088 (currently not serving live traffic).

elukey triaged this task as Medium priority.Jan 20 2025, 3:24 PM

@Matthew_Clemente
We tested some options yesterday in codfw. When we use 6ft power cables and 3m DAC cable for network we are able to pull the server 19'' out of the rack to access the 12 top-loading drive bays without powering down the server. I am waiting on @VRiley-WMF to confirm also that this works for eqiad. if it does work, the next step will be to replace all the other servers with 6ft power cables and 3m DAC cables. Thanks

@Papaul did you mean to tag me? In any case, thanks for the update. I think we'll still need T384003 to be confident that we can actually hot-swap drives in this systems in anger, but this is promising news :)

@Matthew_Clemente yes the tag was for you. I will leave your guys work on the software side T384003

RobH mentioned this in Unknown Object (Task).Apr 24 2025, 6:05 PM
RobH mentioned this in Unknown Object (Task).
RobH mentioned this in Unknown Object (Task).Apr 25 2025, 2:47 PM
RobH mentioned this in Unknown Object (Task).

@MatthewVernon is this ticket close-able? or is there still testing going on here?

elukey claimed this task.

We can definitely close it thanks!

Summary - we were able to solve the cabling issue, and after a long review (see T384003 and T391854) we decided to move to a different RAID controller (oriented to IT/Passthrough mode, more inline with JBOD).