Arzhel discovered that the labstore1006,7 boxes are using their internal 1G nics. They should have 10G ones, as they were ordered with them (see T161345). They are racked in 1G racks, which means they will need to be moved. This should be scheduled, and then the 10G nics set up in the bios for use.
|operations/puppet : production||Updating MAC address labstore1006-7|
|Resolved||ArielGlenn||T182540 get datset1001, ms1001 ready for decommission|
|Resolved||Cmjohnson||T186756 Move labstore1006 and 1007 to 10G enabled racks in row A & D|
- Mentioned Here
- D7: Testing: DO not merge
D2: Add .arcconfig for differential/arcanist
T172459: eqiad row D switch upgrade
D6: Interactive deployment shell aka iscap
T118154: determine hardware needs for dumps in eqiad and codfw
T161311: Eqiad: Hardware request for labstore1006/7, dataset1002/3
T167984: rack/setup/install labstore100.wikimedia.org
@Cmjohnson When we racked labstore1006 & 7 we approved the proposal for racking in 1GBE racks (T167984). I did not know that we had specifically ordered (Hardware request - T161311) 10G NICs on these boxes because the public dumps servers need those enabled (discussed in T118154#3017229)
It looks like we now need to move these to 10GBE racks and enable those NICs. Could we schedule a time to re-rack these servers as soon as possible? Thank you!
Moving rows means the IP address and vlan change. So that is usually a reimage but can also be done manually I suppose, unless anyone forsees any major complication due to that? (I'm really not sure if puppet would care, since it goes by the fully qualifed hostname, not IP afaik.
@RobH On reimaging - that is fine - it takes over a week to import all dumps data so if we can reimage OS but keep data in shelves and internal (separate from OS) disk - that would be great. Reimaging is not an issue otherwise.
Is having them both in the same row a temporary solution, and are those servers redundant (can loose both without service interruption)?
As we're on a row-redundancy model (we should be able to lose a whole row) I want to make sure we're not going to have another issue like T172459, where too many critical services in the same row prevent either maintenance or would be catastrophic in case of failure.
My understanding is there is 10G available in other rows, but they will go away during the refresh and possible be replaced with 10G in a different rack in the same row. So it would mean the system and shelf has to move to a 10G rack now, and then in a few months be reshuffled into another 10G rack.