Page MenuHomePhabricator

Decommission es2005-es2010
Closed, ResolvedPublic

Description

After T126006, reset current configuration and remove monitoring. Some of these will be used for disks on other datacenter servers, some will be repurposed.

List of servers:

es2005.codfw.wmnet
es2006.codfw.wmnet
es2007.codfw.wmnet
es2008.codfw.wmnet
es2009.codfw.wmnet
es2010.codfw.wmnet

es2001-es2004 have already been replaced, but they will be re-purposed temporally as ES offline copies (disaster recovery), until they are properly integrated on bacula.

  • - wipe ALL disks on es2005-es2010
  • - unrack post wipe and store for later sale with other decommissioned servers
  • - remove mgmt dns entries
  • - update racktables
  • - add decommissioned servers to decom tracking tab
  • - assign task back to @RobH to remove the switch configuration description/vlan assignments.

Event Timeline

Restricted Application added subscribers: Zppix, Southparkfan, Aklapper. · View Herald Transcript
jcrespo moved this task from Triage to In progress on the DBA board.

Change 287612 had a related patch set uploaded (by Jcrespo):
[WIP]Remove es2001-es2010 from production puppet

https://gerrit.wikimedia.org/r/287612

1.8 TB drives

es2001: all drives ok
es2002: all drives ok
es2003: all drives ok
es2004: all drives ok

559 GB drives

es2005: 1 critical disk: SMART alert on Slot Number: 4
es2006: all drives ok
es2007: 2 critical disks: SMART alert and media errors on Slots 7 and 8
es2008: 4 critical disks: SMART alert on Slots 4, 6, 8 and 9
es2009: 2 failed disks: 1 and 4, 2 critical disks: SMART and errors on Slots 3, 6, 10 and 11
es2010: down, disk state unknown

Change 287612 merged by Jcrespo:
Remove es2001-es2010 from production puppet

https://gerrit.wikimedia.org/r/287612

Change 287645 had a related patch set uploaded (by Jcrespo):
Remove dns entries for es2001-es2010

https://gerrit.wikimedia.org/r/287645

When these are really for total shutdown, please assign to me so we can figure out how much we're going to reclaim for parts, and how many will be decommissioned and sold off for the space.

Thanks!

@RobH I would reuse for parts 5-7, decom 8-10 (with maybe salvage some drives), and repurpose 1-4 (which seem to be in a pristine state). In any case, let's try to fix pending issues on T128057#2095309

Only DNS commit is pending (see above) from the software slide of lifecycle, the servers are all powered off. But I would thank a second review.

I have a need for storage for backups/db archiving, but I have yet to perform a formal proposal. I *understand* that I do not have the last word about this- you and mark do (and depending on the needs, either new servers should be bought or these ones do not fit), but my wish is to delay wiping those servers until a decision has been taken.

Understood, we'll want to wipe any servers and disks we reclaim for any purpose though, so we won't touch anything until a decision has been made on db archiving.

FYI: I don't have much decision-making for that, I just provide the hardware details! =]

jcrespo mentioned this in Unknown Object (Task).May 16 2016, 12:40 PM

@jcrespo wants to keep es2001-2004 in their current state (racked, power on, with their current data) for another year as a safety measure. That's fine with me. Let's plan to replace these servers with new hardware approximately one year from now.

@robh- my suggestion of follow up:

  • I will power on es2001-es2004, keep name and network for that year so there is not overhead on DC ops
  • Folowing Mark's recommendarion: create a new entry on renewal plan for these servers (remember that the original function of these have already been replaced by the batch es2011-es2019). But that should not be "es" or "db" servers, they will probably have to be disks or storage servers for bacula, and we will need Alex's feedback for that
  • es2005-es2010 can be decommed/used for parts/unracked as usual (@Papaul)

I've updated https://gerrit.wikimedia.org/r/#/c/287645 to not remove es200[1234], ready to apply when needed.

jcrespo renamed this task from Decommission es2001-es2010 to Decommission es2005-es2010.May 24 2016, 2:38 PM
jcrespo updated the task description. (Show Details)

Actually, I do not know if this should be RobH's or Papaul's, you can negotiate that. Have T134755#2276334 and T128057#2095309 in mind.

First I'll take it for general review and update, then re-task to papaul for the onsite wipes.

So I'm stealing this back.

The switch ports for es2005-es2010 have been disabled, but still have the ES labels/descriptions.

Once these machines are fully wiped and removed from the rack, there will need to be a followup step of removing them from the switch config. Since they are still in the rack, this isn't done until they are unracked. @jcrespo already removed in most of the software/repos/config/monitoring, and I removed the last entries in the install_server module & the production dns entries.

Since @Papaul doesn't yet know how to do this in the switch software, this will have to task back to me after they are removed from the racks.

The mgmt IPs have been intentionally left in place until these are unracked. At that time, they can be removed.

So assigning to @Papaul for the following:

  • - wipe ALL disks on es2005-es2010
  • - unrack post wipe and store for later sale with other decommissioned servers
  • - remove mgmt dns entries
  • - update racktables
  • - add decommissioned servers to decom tracking tab
  • - assign task back to @RobH to remove the switch configuration description/vlan assignments.
RobH edited projects, added hardware-requests; removed Patch-For-Review.
RobH moved this task from Backlog to Reclaim (Spares/Decommission) on the hardware-requests board.

Change 290665 had a related patch set uploaded (by Jcrespo):
Reintroduce es200[1234] in puppet, without specific roles

https://gerrit.wikimedia.org/r/290665

Change 290665 merged by Jcrespo:
Reintroduce es200[1234] in puppet, without specific roles

https://gerrit.wikimedia.org/r/290665

when making an unrelated DNS change, "authdns-update" told me there are pending changes that remove these hosts from DNS. I have not touched it yet though.

but I have not yet committed: gerrit:287645 Maybe robh did?

There was a pending change by me to remove es2005-es2010 as they are ready for wipe. We fixed via irc chat.

Change 287645 abandoned by Jcrespo:
Remove dns entries for es2005-es2010

Reason:
Done elsewhere.

https://gerrit.wikimedia.org/r/287645

Papaul triaged this task as Medium priority.May 26 2016, 4:33 PM
Papaul updated the task description. (Show Details)

@RobH all steps complete you good to remove the entries on the switches.

switch port descriptions removed