Page MenuHomePhabricator

Move labstore1006 and 1007 to 10G enabled racks in row A & D
Closed, ResolvedPublic

Description

Arzhel discovered that the labstore1006,7 boxes are using their internal 1G nics. They should have 10G ones, as they were ordered with them (see T161345). They are racked in 1G racks, which means they will need to be moved. This should be scheduled, and then the 10G nics set up in the bios for use.

After discussing with Chris, the current proposal is to move labstore1006 from A1 to A4, and cross-patching to A5 (10G enabled rack), and labstore1007 from D6 to D2 or D7, that are both 10G enabled.

Details

Related Gerrit Patches:
operations/puppet : productionUpdating MAC address labstore1006-7

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 7 2018, 9:14 PM
ArielGlenn triaged this task as Normal priority.Feb 7 2018, 9:15 PM

@Cmjohnson When we racked labstore1006 & 7 we approved the proposal for racking in 1GBE racks (T167984). I did not know that we had specifically ordered (Hardware request - T161311) 10G NICs on these boxes because the public dumps servers need those enabled (discussed in T118154#3017229)

It looks like we now need to move these to 10GBE racks and enable those NICs. Could we schedule a time to re-rack these servers as soon as possible? Thank you!

@madhuvishy we do not have 10G racks in row B yet. We are doing a network refresh and will be adding 10G racks in the next couple of months. (work is in progress)

@Cmjohnson Can we move them to a row with 10G then? These are in public vlan so don't need labs-support. I believe they are currently in A and D.

RobH added a comment.Feb 7 2018, 10:47 PM

Moving rows means the IP address and vlan change. So that is usually a reimage but can also be done manually I suppose, unless anyone forsees any major complication due to that? (I'm really not sure if puppet would care, since it goes by the fully qualifed hostname, not IP afaik.

@madhuvishy I will talk with @faidon. I have no issues moving one to row D in a 10G rack but the 10G racks in A and C are changing. It would be best to wait until the refresh is complete for the other server or we will have to move it again.

@Cmjohnson So to clarify, do both row A and D (or the racks we have these servers in - D6 and A1) not have 10G enabled?

@RobH On reimaging - that is fine - it takes over a week to import all dumps data so if we can reimage OS but keep data in shelves and internal (separate from OS) disk - that would be great. Reimaging is not an issue otherwise.

@madhuvishy They are currently not in 10G racks. I can move one from d6 to d7 or d2(both 10G racks). The network gear has already been refreshed in row D. Row A would not any sense to move at this time. The servers there will be going through a shuffle

madhuvishy renamed this task from set up labstore1006,1007 for use of their 10G nics to Move labstore1006 and 1007 to 10G enabled racks in row D.Feb 7 2018, 11:01 PM
madhuvishy assigned this task to Cmjohnson.
madhuvishy updated the task description. (Show Details)

Is having them both in the same row a temporary solution, and are those servers redundant (can loose both without service interruption)?

As we're on a row-redundancy model (we should be able to lose a whole row) I want to make sure we're not going to have another issue like T172459, where too many critical services in the same row prevent either maintenance or would be catastrophic in case of failure.

@ayounsi No we can't lose both without service interruption. I am not sure how we can have row level redundancy in this case if there is only 10G availability in one row.

RobH added a comment.EditedFeb 7 2018, 11:22 PM

My understanding is there is 10G available in other rows, but they will go away during the refresh and possible be replaced with 10G in a different rack in the same row. So it would mean the system and shelf has to move to a 10G rack now, and then in a few months be reshuffled into another 10G rack.

I can probably make room in C8 but that requires expediting the decom of several MC servers. I am not a fan of having to move this server and it's disk array more than once.

+1 on moving only once!

madhuvishy renamed this task from Move labstore1006 and 1007 to 10G enabled racks in row D to Move labstore1006 and 1007 to 10G enabled racks in row A & D.Feb 8 2018, 12:12 AM
madhuvishy updated the task description. (Show Details)
Cmjohnson moved this task from Backlog to Up next on the ops-eqiad board.Feb 13 2018, 4:09 PM

Change 411324 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Updating MAC address labstore1006-7

https://gerrit.wikimedia.org/r/411324

Change 411324 merged by Cmjohnson:
[operations/puppet@production] Updating MAC address labstore1006-7

https://gerrit.wikimedia.org/r/411324

madhuvishy closed this task as Resolved.Feb 20 2018, 4:27 PM

The servers are moved and up and running! Thanks for your work @Cmjohnson.