Page MenuHomePhabricator

Decommission elastic1001-1016
Closed, ResolvedPublic

Description

Now that new elasticsearch servers are up and running, we need to decommission the old one.

Decommission steps:

  • - ensure all services are no longer in use - done by discovery
  • - production dns removed by @Gehel
  • - all puppet entries removed by @Gehel
  • - switch ports disabled & descriptions removed (since these will be removed from rack)
  • - SSDs removed and/or HDDs wiped
  • - system unracked and placed in storage
  • - system mgmt dns entries removed

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

Mentioned in SAL [2016-07-13T13:20:07Z] <gehel> scheduling icinga downtime on elastic1001-1016 prior to decommissioning (T139758)

Mentioned in SAL [2016-07-13T13:34:26Z] <gehel> disabling puppet and stopping elasticsearch on elastic1001-1016 (T139758)

Change 295649 had a related patch set uploaded (by Gehel):
Decommission old elasticsearch servers

https://gerrit.wikimedia.org/r/295649

Mentioned in SAL [2016-07-13T13:58:43Z] <gehel> cleanup puppet / salt from old elasticsearch servers elastic1001-1016 (T139758)

Mentioned in SAL [2016-07-13T14:16:34Z] <gehel> shutting down elastic1001-1016 (T139758)

Gehel added a subscriber: RobH.

elastic1001-1016 have been shutdown. I followed the documented procedure, stopping at removing salt keys. The cleanup of DNS entries, switch ports, bios settings etc... still needs to be done.

@RobH Could you have a look and let me know if we reclaim or decommission those servers?

Change 298790 had a related patch set uploaded (by Gehel):
Remove old elasticsearch servers from DNS

https://gerrit.wikimedia.org/r/298790

I've synced up wtih @Gehel about this via irc, and I'll go ahead and complete the switch port steps onward for the decommissioning.

I'm assuming these will be decomssioned, since they all had their warranties expire in April 2014. However, before we pull them from the actual racks, we'll need to get @mark's approval to fully decommission these. I won't bother escalating this for that until the other steps (short of removing SSDs and pulling from racks) are complete.

Still 2 patches waiting to be merged:

Waiting for Jenkins to process its queue...

Change 295649 merged by Gehel:
Decommission old elasticsearch servers

https://gerrit.wikimedia.org/r/295649

Actually, before I go into the switches and disable all the ports, I'll get @mark's feedback on decommission or keep.

That way I can avoid having to go back into the switches again in a few days just to remove all the entries.

@mark: Can you please advise if we can decommission and unrack elastic1001-1016. These all had their warranties expire in April of 2014, so over two years ago. Since these are over 5 years old, I'd like to pull the SSDs for destruction, and have the systems placed in our decommission pile.

Please advise and assign task back to me, thanks!

Change 298790 merged by Gehel:
Remove old elasticsearch servers from DNS

https://gerrit.wikimedia.org/r/298790

The dns entries were removed but not propagated. I pushed the changed today.

@mark confirmed during weekly ops meeting that we'll decommission those servers as they are old enough.

RobH reopened this task as Open.EditedJul 21 2016, 7:16 PM

@debt: this is not closed, as I have not finished the decommission process. Please don't resolve this task.

Since its in #hw-requests and decommissioning, it isn't complete until the dc-ops team finishes all steps.

Whoops, sorry! Please continue doing what needs to be done and thanks for removing the search tags. :)

No worries, I figured removing the tags would clear it from your workboards/radar =]

RobH updated the task description. (Show Details)

Assigned to @Cmjohnson for ssh removal/disk wipe before unracking. Once they are unracked (and added to decom tracking sheet), their mgmt dns entries can be pulled.