Page MenuHomePhabricator

Decommission db1073.eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the decommission-hardware of server db1073.eqiad.wmnet

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

db1073
Steps for service owner:

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - Label disk #3 as broken so it doesn't get re-used again
  • - Label disk #11 as broken so it doesn't get re-used again
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to Pending comment on the DBA board.

This host was just removed from being a master [T229657] let's give it a few more days before actually start its decommissioning process.

Change 534269 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1073: Disable notifications

https://gerrit.wikimedia.org/r/534269

Change 534269 merged by Marostegui:
[operations/puppet@production] db1073: Disable notifications

https://gerrit.wikimedia.org/r/534269

Change 535348 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1073 from config

https://gerrit.wikimedia.org/r/535348

Change 535348 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1073 from config

https://gerrit.wikimedia.org/r/535348

Mentioned in SAL (#wikimedia-operations) [2019-09-10T05:45:52Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db1073 from config T231892 (duration: 00m 55s)

Mentioned in SAL (#wikimedia-operations) [2019-09-10T05:46:55Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db1073 from config T231892 (duration: 00m 54s)

Change 535994 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Decommission db1073

https://gerrit.wikimedia.org/r/535994

Change 535994 merged by Marostegui:
[operations/puppet@production] mariadb: Decommission db1073

https://gerrit.wikimedia.org/r/535994

Mentioned in SAL (#wikimedia-operations) [2019-09-12T05:59:28Z] <marostegui> Remove db1073 from tendril and zarcillo T231892

Mentioned in SAL (#wikimedia-operations) [2019-09-12T06:00:57Z] <marostegui> Stop MySQL on db1073 for decommission T231892

Marostegui updated the task description. (Show Details)
Marostegui moved this task from Backlog to Ready for Decommission on the decommission-hardware board.

This host is now ready for DC-Ops to finish its decommission steps.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db1073.eqiad.wmnet

  • db1073.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 540256 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] site.pp: Remove db1073 references.

https://gerrit.wikimedia.org/r/540256

Change 540257 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Remove production DNS entries from db1073

https://gerrit.wikimedia.org/r/540257

Change 540256 merged by Marostegui:
[operations/puppet@production] site.pp: Remove db1073 references.

https://gerrit.wikimedia.org/r/540256

Change 540257 merged by Marostegui:
[operations/dns@master] wmnet: Remove production DNS entries from db1073

https://gerrit.wikimedia.org/r/540257

Marostegui removed a project: Patch-For-Review.
Marostegui updated the task description. (Show Details)

Host ready for on-site steps + switch port disablement

papaul@asw2-b-eqiad# show | compare 
[edit interfaces interface-range disabled]
     member ge-2/0/19 { ... }
+    member ge-3/0/26;
[edit interfaces]
-   ge-3/0/26 {
-       description db1073;
-   }

Change 555542 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for radium,db1069,db1072 and db1073

https://gerrit.wikimedia.org/r/555542

Change 555542 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for radium,db1069,db1072 and db1073

https://gerrit.wikimedia.org/r/555542

Papaul updated the task description. (Show Details)

complete