Page MenuHomePhabricator

Swap NIC on mira
Closed, DeclinedPublic

Description

mira became unresponsive this morning and was powercycled by Luca. The kernel logs show that the NIC was no longer reactive to the kernel. It's likely a hardware failure, the system is almost five years old. Do we have have some hardware diagnostics for the card, confirming that? The system is OOW, but maybe we have a spare card around? The server can be taken down for hardware checks at any time.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The 1Gbit nic on ALL of our servers is a build in NIC. On the newest systems, they may be replaceable (not sure) but on these older ones, its part of the mainboard and when it goes, that is it.

Sometimes only the first two ports go bad first on the onboard, but NIC ports 3 and 4 sometimes still work. I'd suggest attempting to use one of those to migrate any data off, and then schedule a replacement server. We don't tend to keep spare third party PCIe network cards, since a NIC failure typically means the mainboard is failing.

Please let me know if any assistance is needed in determining the specification of the replacement server. Typically someone should file a hardware-requests listing off all the needed items for the system, as well as a breakdown on what it does.

Hope that helps!

Papaul triaged this task as Medium priority.Apr 13 2017, 3:17 PM
RobH changed the task status from Open to Stalled.Apr 13 2017, 3:37 PM
RobH claimed this task.
RobH added a subscriber: Papaul.

@Papaul:

Please don't bother to troubleshoot this, as we've progressed to replacing it outright with T162900.

For now, I'll steal this task back until I copy data from the system, and then I'll change this to a decommission task.

@RobH just note that mira has only 2 NIC's not 4

My mistake, it seems the R420s have 4 ports, but the R320s only had 2. Either way, we'll leave mira alone and online until after the replacement system naos is online and ready for use.

naos is online and used, I think we should fix mira's NIC and deprovision / allocate to spare now (or decom altogether)

Dzahn subscribed.

decom-or-reclaim task: T164588

if it becomes decom, this can be rejected. if it becomes reclaim, this should still be done.

system is out of warranty, thanks for making the decom task (in future all reclaim/decom tasks should also have hardware-requests), I'll handle the decom side from here out.

Thanks!