mira became unresponsive this morning and was powercycled by Luca. The kernel logs show that the NIC was no longer reactive to the kernel. It's likely a hardware failure, the system is almost five years old. Do we have have some hardware diagnostics for the card, confirming that? The system is OOW, but maybe we have a spare card around? The server can be taken down for hardware checks at any time.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | RobH | T162859 Swap NIC on mira | |||
Resolved | faidon | T162897 spare pool allocation of WMF6406 to replace mira | |||
Resolved | RobH | T162900 setup naos/WMF6406 as new codfw deployment server |
Event Timeline
The 1Gbit nic on ALL of our servers is a build in NIC. On the newest systems, they may be replaceable (not sure) but on these older ones, its part of the mainboard and when it goes, that is it.
Sometimes only the first two ports go bad first on the onboard, but NIC ports 3 and 4 sometimes still work. I'd suggest attempting to use one of those to migrate any data off, and then schedule a replacement server. We don't tend to keep spare third party PCIe network cards, since a NIC failure typically means the mainboard is failing.
Please let me know if any assistance is needed in determining the specification of the replacement server. Typically someone should file a hardware-requests listing off all the needed items for the system, as well as a breakdown on what it does.
Hope that helps!
My mistake, it seems the R420s have 4 ports, but the R320s only had 2. Either way, we'll leave mira alone and online until after the replacement system naos is online and ready for use.
naos is online and used, I think we should fix mira's NIC and deprovision / allocate to spare now (or decom altogether)
decom-or-reclaim task: T164588
if it becomes decom, this can be rejected. if it becomes reclaim, this should still be done.
system is out of warranty, thanks for making the decom task (in future all reclaim/decom tasks should also have hardware-requests), I'll handle the decom side from here out.
Thanks!