Page MenuHomePhabricator

Decommission db2068.codfw.wmnet
Closed, ResolvedPublic

Description

This task will track the decommission of server db2068.codfw.wmnet

With the launch of updates to the decom cookbook, the majority of these steps can be handled by the service owners directly. The DC Ops team only gets involved once the system has been fully removed from service and powered down by the decommission cookbook.

db2068

Steps for service owner:

End service owner steps / Begin DC-Ops team steps:

  • - disable switch port / set to asset tag if host isn't being unracked / remove from switch if being unracked.
  • - Label RAID controller, mainboard and disks as broken so they don't get re-used T235366 T180927
  • - system disks wiped (by onsite)
  • - determine system age, under 5 years are reclaimed to spare, over 5 years are decommissioned. If uncertain, ask @wiki_willy.
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Details

Related Gerrit Patches:
operations/dns : masterDNS: Remove mgmt DNS for db2051,db2056 and db2068
operations/dns : masterwmnet: Remove db2068 DNS production entries
operations/puppet : productionmariadb: Remove db2068 from config
operations/mediawiki-config : masterdb-eqiad,db-codfw.php: Remove db2068 from config
operations/puppet : productiondb2068: Disable notifications

Event Timeline

Marostegui updated the task description. (Show Details)

Change 542782 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db2068: Disable notifications

https://gerrit.wikimedia.org/r/542782

Marostegui triaged this task as Normal priority.Oct 14 2019, 5:30 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Change 542782 merged by Marostegui:
[operations/puppet@production] db2068: Disable notifications

https://gerrit.wikimedia.org/r/542782

Mentioned in SAL (#wikimedia-operations) [2019-10-14T05:47:45Z] <marostegui> Remove db2068 from tendril and zarcillo T235399

Change 542783 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2068 from config

https://gerrit.wikimedia.org/r/542783

Change 542783 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2068 from config

https://gerrit.wikimedia.org/r/542783

Mentioned in SAL (#wikimedia-operations) [2019-10-14T06:02:13Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db2068 from config T235399 (duration: 00m 53s)

Marostegui updated the task description. (Show Details)Oct 14 2019, 6:03 AM

Mentioned in SAL (#wikimedia-operations) [2019-10-14T06:03:11Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db2068 from config T235399 (duration: 00m 51s)

Change 542790 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Remove db2068 from config

https://gerrit.wikimedia.org/r/542790

Change 542791 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/dns@master] wmnet: Remove db2068 DNS production entries

https://gerrit.wikimedia.org/r/542791

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2068.codfw.wmnet

  • db2068.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2068.codfw.wmnet

  • db2068.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Talked to @MoritzMuehlenhoff about this, and as it has been done with other cases, given that the host is unusable anyways, we'll proceed as usual and just labeling the storage as broken so it doesn't get reused (which is already done).

Change 542790 merged by Marostegui:
[operations/puppet@production] mariadb: Remove db2068 from config

https://gerrit.wikimedia.org/r/542790

Change 542791 merged by Marostegui:
[operations/dns@master] wmnet: Remove db2068 DNS production entries

https://gerrit.wikimedia.org/r/542791

Marostegui reassigned this task from Marostegui to Papaul.Oct 14 2019, 7:08 AM
Marostegui edited projects, added decommission, ops-codfw; removed Patch-For-Review, DBA.
Marostegui updated the task description. (Show Details)
Marostegui added a project: DC-Ops.
Marostegui added a subscriber: jcrespo.

Host ready for DC-Ops steps

Mentioned in SAL (#wikimedia-operations) [2019-10-14T07:21:01Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Remove db2068 from config - T235399', diff saved to https://phabricator.wikimedia.org/P9319 and previous config saved to /var/cache/conftool/dbconfig/20191014-072100-marostegui.json

papaul@asw-d-codfw# show | compare 
[edit interfaces interface-range vlan-private1-d-codfw]
-    member ge-6/0/16;
[edit interfaces interface-range disabled]
     member ge-6/0/4 { ... }
+    member ge-6/0/16;
[edit interfaces]
-   ge-6/0/16 {
-       description db2068;
-       enable;
-   }
Papaul updated the task description. (Show Details)Oct 15 2019, 4:08 PM
Papaul moved this task from Backlog to Decommission on the ops-codfw board.Oct 16 2019, 12:20 AM
Papaul updated the task description. (Show Details)Oct 16 2019, 3:34 PM

Change 543484 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for db2051,db2056 and db2068

https://gerrit.wikimedia.org/r/543484

Change 543484 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for db2051,db2056 and db2068

https://gerrit.wikimedia.org/r/543484

Papaul closed this task as Resolved.Oct 16 2019, 11:19 PM
Papaul updated the task description. (Show Details)

Complete