Page MenuHomePhabricator

Decommission db2068.codfw.wmnet
Closed, ResolvedPublic

Description

This task will track the decommission-hardware of server db2068.codfw.wmnet

With the launch of updates to the decom cookbook, the majority of these steps can be handled by the service owners directly. The DC Ops team only gets involved once the system has been fully removed from service and powered down by the decommission cookbook.

db2068

Steps for service owner:

End service owner steps / Begin DC-Ops team steps:

  • - disable switch port / set to asset tag if host isn't being unracked / remove from switch if being unracked.
  • - Label RAID controller, mainboard and disks as broken so they don't get re-used T235366 T180927
  • - system disks wiped (by onsite)
  • - determine system age, under 5 years are reclaimed to spare, over 5 years are decommissioned. If uncertain, ask @wiki_willy.
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

Change 542782 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2068: Disable notifications

https://gerrit.wikimedia.org/r/542782

Marostegui moved this task from Triage to In progress on the DBA board.

Change 542782 merged by Marostegui:
[operations/puppet@production] db2068: Disable notifications

https://gerrit.wikimedia.org/r/542782

Mentioned in SAL (#wikimedia-operations) [2019-10-14T05:47:45Z] <marostegui> Remove db2068 from tendril and zarcillo T235399

Change 542783 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2068 from config

https://gerrit.wikimedia.org/r/542783

Change 542783 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2068 from config

https://gerrit.wikimedia.org/r/542783

Mentioned in SAL (#wikimedia-operations) [2019-10-14T06:02:13Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db2068 from config T235399 (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2019-10-14T06:03:11Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db2068 from config T235399 (duration: 00m 51s)

Change 542790 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Remove db2068 from config

https://gerrit.wikimedia.org/r/542790

Change 542791 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Remove db2068 DNS production entries

https://gerrit.wikimedia.org/r/542791

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2068.codfw.wmnet

  • db2068.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2068.codfw.wmnet

  • db2068.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Talked to @MoritzMuehlenhoff about this, and as it has been done with other cases, given that the host is unusable anyways, we'll proceed as usual and just labeling the storage as broken so it doesn't get reused (which is already done).

Change 542790 merged by Marostegui:
[operations/puppet@production] mariadb: Remove db2068 from config

https://gerrit.wikimedia.org/r/542790

Change 542791 merged by Marostegui:
[operations/dns@master] wmnet: Remove db2068 DNS production entries

https://gerrit.wikimedia.org/r/542791

Marostegui updated the task description. (Show Details)
Marostegui added a project: DC-Ops.
Marostegui added a subscriber: jcrespo.

Host ready for DC-Ops steps

Mentioned in SAL (#wikimedia-operations) [2019-10-14T07:21:01Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Remove db2068 from config - T235399', diff saved to https://phabricator.wikimedia.org/P9319 and previous config saved to /var/cache/conftool/dbconfig/20191014-072100-marostegui.json

papaul@asw-d-codfw# show | compare 
[edit interfaces interface-range vlan-private1-d-codfw]
-    member ge-6/0/16;
[edit interfaces interface-range disabled]
     member ge-6/0/4 { ... }
+    member ge-6/0/16;
[edit interfaces]
-   ge-6/0/16 {
-       description db2068;
-       enable;
-   }

Change 543484 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for db2051,db2056 and db2068

https://gerrit.wikimedia.org/r/543484

Change 543484 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for db2051,db2056 and db2068

https://gerrit.wikimedia.org/r/543484

Papaul updated the task description. (Show Details)

Complete