Page MenuHomePhabricator

decom radium
Closed, ResolvedPublic

Description

radium has been replaced by torrelay1001 in the parent task T196701

this is to decom radium

waiting a few days before starting on this. currently the data is still available in /var/lib/to in case we want to revert


checklist copied from https://phabricator.wikimedia.org/P7432

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.)

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Dzahn triaged this task as Medium priority.Sep 8 2018, 12:57 AM
Dzahn created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 458946 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: turn radium into a spare system

https://gerrit.wikimedia.org/r/458946

turning it into a spare::system already to remove unused Icinga monitoring, stop the rsync service via puppet etc

Dzahn changed the task status from Open to Stalled.Sep 8 2018, 1:00 AM

Change 458946 merged by Dzahn:
[operations/puppet@production] site: turn radium into a spare system

https://gerrit.wikimedia.org/r/458946

Change 459878 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] remove hosts/radium.yaml from Hiera

https://gerrit.wikimedia.org/r/459878

Change 459878 merged by Dzahn:
[operations/puppet@production] remove hosts/radium.yaml from Hiera

https://gerrit.wikimedia.org/r/459878

Dzahn changed the task status from Stalled to Open.Sep 13 2018, 12:59 AM
Dzahn removed projects: Patch-For-Review, Tor.
Dzahn changed Risk Rating from N/A to default.
Dzahn removed Dzahn as the assignee of this task.Sep 13 2018, 8:32 PM

wmf-decommission-host was executed by robh for radon.wikimedia.org and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

radon network port asw2-c-eqiad:ge-4/0/25

Change 461226 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom radon

https://gerrit.wikimedia.org/r/461226

Change 461226 merged by RobH:
[operations/puppet@production] decom radon

https://gerrit.wikimedia.org/r/461226

RobH edited projects, added ops-eqiad; removed Patch-For-Review.
RobH updated the task description. (Show Details)

@RobH This ticket is about radium but there is also a decom ticket for radon at the same time at T202040. I think they got mixed up above. Just wanted to let you know.

RobH added a subscriber: Cmjohnson.

Please note I did indeed swap references around, all the entries for radon should have gone to T202040 so stealing this back for its radium decom.

wmf-decommission-host was executed by robh for radium.wikimedia.org and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

radium network port is asw-a-eqiad:ge-3/0/0

Change 461234 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom radium prod dns

https://gerrit.wikimedia.org/r/461234

Change 461235 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom radium puppet repo entries

https://gerrit.wikimedia.org/r/461235

Change 461234 merged by RobH:
[operations/dns@master] decom radium prod dns

https://gerrit.wikimedia.org/r/461234

Change 461235 merged by RobH:
[operations/puppet@production] decom radium puppet repo entries

https://gerrit.wikimedia.org/r/461235

Ok, this is now all set, radium is ready for onsite steps for decom.

Change 555542 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for radium,db1069,db1072 and db1073

https://gerrit.wikimedia.org/r/555542

Change 555542 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for radium,db1069,db1072 and db1073

https://gerrit.wikimedia.org/r/555542

Papaul updated the task description. (Show Details)
Papaul subscribed.

complete