Seems these are both relying on the health of one disk even though they have a secondary to do software RAID10.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Dzahn | T136562 Audit/fix hosts with no RAID configured | |||
Resolved | Andrew | T136718 labnet100[12].eqiad.wmnet need to be reimaged with RAID |
Event Timeline
labnet1001 is reimaged now. The steps are a bit weird due to unusual networking setup:
The 10g nic can't support pxe-booting. So to install you have to go into the bios and re-enable the first internal nic card. Then, after install (and puppet run and signing and such) I rebooted, re-disabled the internal nic, and then after the OS loaded went into /etc/udev/rules.d/70-persistent-net.rules and reassigned things so that the 10g adapters are eth0 and eth1, then rebooted again.
Most likely that's needed when reimaging labnet1002 as well. But first we have to fail over from 1002 to 1001 which will be messy.
we said next week post-Liberty upgrade we will schedule a time for failover and reimage of labnet1002.
Mentioned in SAL [2016-09-07T15:08:36Z] <andrewbogott> re-imaging labnet1002 for T136718
Labnet1001 is now the live network/api host, and I just reimaged labnet1002 with a raid.