Page MenuHomePhabricator

codfw old mw app server decomission
Closed, ResolvedPublic

Description

mw20[0-7[1-9] (Total of 74 servers) are all Dell PowerEdge R410 old servers out of warranty. Since we order 36 new mw app servers we need to find out which systems we are going to decommission first so we can make room for the new systems.
@Joe since all the mw app servers in A3 (total of 40 ) are all old and out of warranty will you like for us to take the first 36 and replace with the new system(option a) or you prefer for us to take half in A3 and half in A4 (now we have 34 in A4) (option2)
if you go with option 2 please update this task with the systems that we need to decommission.

Event Timeline

@Papaul I need to take some time to think of a transition strategy, I'll let you know as soon as I have time to think.

RobH mentioned this in Unknown Object (Task).May 17 2016, 4:19 PM

@Papaul my proposal would be:

  1. Swap out all mw* servers in row A3
  2. Install 24 servers in A3
  3. Remove mw2041-mw2060 from row A4
  4. Replace them with the remaining 12 servers

What will the names of the new servers be?

@Joe
Rob suggested we use mw2215 which is the next app server but I think we can reused the same name. What do you think.

Please do not reuse the old names for mw systems. Right now we know that higher # mw systems are newer systems, and its easier to do that for now. Please name these mw2215 up.

When we have a large chunk (like mw2001-2100 all decommissioned fully), we should then think about re-using hostnames. Until then, mixing old hostnames on new systems just adds to confusion. We have more mw2XXX hostnames than we can use, they are free.

@Papaul @RobH let's not reuse hostnames, we didn't do that in eqiad either. We can start thinking about reusing hostnames when 2-300 slots are free, in a veery distant future :)

Joe claimed this task.

So I will start decommissioning the servers we want to dismiss.

Change 290407 had a related patch set uploaded (by Giuseppe Lavagetto):
mediawiki: decommission old codfw appservers

https://gerrit.wikimedia.org/r/290407

Change 290407 merged by Giuseppe Lavagetto:
mediawiki: decommission old codfw appservers

https://gerrit.wikimedia.org/r/290407

@Papaul mw2001-2016 and mw2018-mw2060 are turned off and effectively decommissioned. Please take care of not turning off/unrack mw2017 as it is actively used as a debug host.

starting disk wipe on mw2001-mw2016 and mw2018- mw2060

Papaul triaged this task as Medium priority.May 26 2016, 4:32 PM

disk wipe complete on mw2001-mw2016 and mw2018-mw2040. Those servers are unracked and stored in the storage area. Disk wipe in progress on mw2014-mw2060

disk wipe complete on me2041 \-mw2060. servers are unracked and stored in storage.
@RobH on the switches
ge-3/0/26 to ge-3/0/39 rack A3
and
ge-4/0/11 to ge-4/0/19 rack A4
are not in use for now. You can remove the switch configuration description and resolve the task when done. Thanks.

ok, ge-3/0 is done, need to do ge-4/0 interfaces next.

So ge-4/0/11 shows up, even though the server that should be in it is in the decommissioned rack.

@Papaul: Can you investigate what system is plugged into ge-4/0/11 and update this task via comment? (then assign back to me, thanks!)

ge-4/0/11 up up mw2052
ge-4/0/12 up down mw2053
ge-4/0/13 up down mw2054
ge-4/0/14 up down mw2055
ge-4/0/15 up down mw2056
ge-4/0/16 up down mw2057
ge-4/0/17 up down mw2058
ge-4/0/18 up down mw2059
ge-4/0/19 up down mw2060

irc update: @Papaul checked and mw2250 is plugged into ge-4/0/11

So it seems that racks port descriptions are incorrect. That being noted, it is clear to see that ge-4/0/12 through ge-4/0/19 are no longer in use.

I'll disable those and remove the port descriptions shortly.