Page MenuHomePhabricator

Decommission db1063.eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the decommission-hardware of server db1063.eqiad.wmnet

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

db1063
Steps for service owner:

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - Label disk #2 as broken so it doesn't get re-used
  • - Label disk #6 as broken so it doesn't get re-used
  • - Label disk #7 as broken so it doesn't get re-used
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Marostegui triaged this task as Medium priority.
Marostegui moved this task from Triage to In progress on the DBA board.

This host is no longer m1 master T231403: Switchover m1 primary master: db1063 to db1135: Tuesday 10th September at 16:00 UTC, but let's wait a few days before decommissioning it

Change 535767 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1063: Disable notifications

https://gerrit.wikimedia.org/r/535767

Change 535767 merged by Marostegui:
[operations/puppet@production] db1063: Disable notifications

https://gerrit.wikimedia.org/r/535767

Change 537316 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Decommission db1063

https://gerrit.wikimedia.org/r/537316

Change 537316 merged by Marostegui:
[operations/puppet@production] mariadb: Decommission db1063

https://gerrit.wikimedia.org/r/537316

Mentioned in SAL (#wikimedia-operations) [2019-09-17T07:40:15Z] <marostegui> Remove db1063 from puppet and zarcillo T232564

Mentioned in SAL (#wikimedia-operations) [2019-09-17T07:41:11Z] <marostegui> Stop mysql on db1063 for decommissioning T232564

Marostegui updated the task description. (Show Details)
Marostegui moved this task from Backlog to Ready for Decommission on the decommission-hardware board.

Host ready for DC-Ops to decommission

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db1063.eqiad.wmnet

  • db1063.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Wiped bootloaders
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 539266 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] site.pp: Remove references to db1063

https://gerrit.wikimedia.org/r/539266

Change 539267 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Remove production entries for db1063

https://gerrit.wikimedia.org/r/539267

Change 539266 merged by Marostegui:
[operations/puppet@production] site.pp: Remove references to db1063

https://gerrit.wikimedia.org/r/539266

Change 539267 merged by Marostegui:
[operations/dns@master] wmnet: Remove production entries for db1063

https://gerrit.wikimedia.org/r/539267

Marostegui removed a project: Patch-For-Review.
Marostegui updated the task description. (Show Details)

Ready for on-site steps + switch disablement

papaul@asw2-c-eqiad# show | compare 
[edit interfaces interface-range vlan-private1-c-eqiad]
-    member ge-5/0/39;
[edit interfaces interface-range disabled]
     member ge-3/0/12 { ... }
+    member ge-5/0/39;
[edit interfaces]
-   ge-5/0/39 {
-       description db1063;
-   }

Change 548907 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for db1063,db1064,db1065,db1068,db1070 and db1071

https://gerrit.wikimedia.org/r/548907

Change 548907 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for db1063,db1064,db1065,db1068,db1070 and db1071

https://gerrit.wikimedia.org/r/548907

Papaul updated the task description. (Show Details)

Complete

DannyS712 subscribed.

[batch] remove patch for review tag from resolved tasks