Page MenuHomePhabricator

decommission: cloudnet2001-dev.codfw.wmnet
Closed, ResolvedPublic

Description

This task will track the decommission of server cloudnet2001-dev.codfw.wmnet

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

cloudnet2001-dev.codfw.wmnet is over 5 years old, decom and disposal is plan, do not reclaim to spares.

Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system)
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - IF RECLAIM: system added back to spares tracking (by onsite)

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptMar 11 2019, 12:38 PM
aborrero updated the task description. (Show Details)Mar 11 2019, 1:07 PM
aborrero renamed this task from Hardware decommission: cloudnet2001-dev.codfw.wmnet to decommission: cloudnet2001-dev.codfw.wmnet.Mar 18 2019, 12:52 PM
aborrero updated the task description. (Show Details)
aborrero updated the task description. (Show Details)
aborrero added subscribers: RobH, Papaul.

Mentioned in SAL (#wikimedia-operations) [2019-03-18T12:54:32Z] <arturo> T218025 disable icinga checks for cloudnet2001-dev.codfw.wmnet

Change 497293 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] wmcs: decommision several codfw servers

https://gerrit.wikimedia.org/r/497293

aborrero updated the task description. (Show Details)Mar 18 2019, 1:01 PM

Change 497293 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] wmcs: decommision several codfw servers

https://gerrit.wikimedia.org/r/497293

Change 497293 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] wmcs: decommision several codfw servers

https://gerrit.wikimedia.org/r/497293

aborrero reassigned this task from aborrero to RobH.Mar 21 2019, 5:12 PM
aborrero updated the task description. (Show Details)

wmf-decommission-host was executed by robh for cloudnet2001-dev.codfw.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH updated the task description. (Show Details)Mar 27 2019, 12:00 AM
RobH reassigned this task from RobH to Papaul.
RobH moved this task from Backlog to pending onsite steps (codfw) on the decommission board.

@Papaul,

Please note I cannot see any server on the switch stack with this label, so I was unable to disable it's network port. Please complete the unchecked steps above.

RobH moved this task from Backlog to Decommission on the ops-codfw board.Mar 27 2019, 12:01 AM
RobH updated the task description. (Show Details)
Papaul triaged this task as Normal priority.Mar 28 2019, 2:50 PM

Switch information

ge-8/0/10
ge-8/0/11

Papaul updated the task description. (Show Details)Apr 1 2019, 10:56 PM
papaul@asw-b-codfw> show interfaces ge-8/0/11 descriptions 
Interface       Admin Link Description
ge-8/0/11       down  down DISABLED

{master:2}
papaul@asw-b-codfw> show interfaces ge-8/0/10 descriptions    
Interface       Admin Link Description
ge-8/0/10       down  down DISABLED
Papaul updated the task description. (Show Details)Apr 1 2019, 11:27 PM
Papaul updated the task description. (Show Details)

Change 500634 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt and production DNS for cloudnet2001-dev

https://gerrit.wikimedia.org/r/500634

Papaul updated the task description. (Show Details)Apr 1 2019, 11:46 PM

Change 500634 merged by Arturo Borrero Gonzalez:
[operations/dns@master] DNS: Remove mgmt and production DNS for cloudnet2001-dev

https://gerrit.wikimedia.org/r/500634

Papaul reassigned this task from Papaul to RobH.Apr 2 2019, 2:36 PM

@RobH there is 1 check box left for this. You can take a look and resolve the task once done.

Thanks.

Change 503152 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] decom cloudnet2001-dev.codfw.wmnet

https://gerrit.wikimedia.org/r/503152

Dzahn added a subscriber: Dzahn.Apr 12 2019, 12:53 AM

per chat with Papaul:

  • switch port is done
  • server is > 5 years old and should not go back to spare
  • removing from site.pp and install_server to finish

Change 503152 merged by Dzahn:
[operations/puppet@production] decom cloudnet2001-dev.codfw.wmnet

https://gerrit.wikimedia.org/r/503152

Dzahn updated the task description. (Show Details)Apr 12 2019, 12:56 AM

Mentioned in SAL (#wikimedia-operations) [2019-04-12T01:00:05Z] <mutante> puppet cert clean, puppet node clean, puppet node deactivate on cloudnet2001-dev.codfw.wmnet (T218025)

Dzahn reassigned this task from RobH to Papaul.Apr 12 2019, 7:45 PM

In the future will use https://wikitech.wikimedia.org/wiki/Decom_script for everything.

Confirmed this one is also out of Debmonitor (and Icinga)

Should be all done. Back to Papaul for physical de-racking.

Dzahn added a comment.Apr 12 2019, 7:54 PM
Papaul closed this task as Resolved.Apr 15 2019, 2:59 PM

This is done.