Page MenuHomePhabricator

Decommission dbstore1002
Closed, ResolvedPublic

Description

dbstore1002 has been migrated to the new hosts (T210478) and thus can be decommissioned

Wait for the green light - dbstore1002 is still being used (for reads)

  • Analytics to confirm this host can be decommissioned

dbstore1002

Decommission Checklist

  • - all system services confirmed offline from production use - should be done by Analytics team
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration - should be done by Analytics team: Removed from config:
  • - any service group puppet/heira/dsh config removed - should be done by Analytics team
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.) - should be done by Analytics team: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/494649/

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw2-d-eqiad:ge-1/0/7
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Related Objects

Event Timeline

Marostegui changed the task status from Open to Stalled.Feb 19 2019, 10:43 AM
Marostegui triaged this task as Medium priority.
Marostegui created this task.

Change 492275 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] analytics-grants.sql: Remove file

https://gerrit.wikimedia.org/r/492275

Change 492275 merged by Marostegui:
[operations/puppet@production] analytics-grants.sql: Remove file

https://gerrit.wikimedia.org/r/492275

Change 493647 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Remove dbstore1002

https://gerrit.wikimedia.org/r/493647

Change 493647 merged by Marostegui:
[operations/puppet@production] install_server: Remove dbstore1002

https://gerrit.wikimedia.org/r/493647

Change 494164 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Delete analytics-store CNAME due to dbstore1002 decom

https://gerrit.wikimedia.org/r/494164

Change 494164 merged by Elukey:
[operations/dns@master] Delete analytics-store CNAME due to dbstore1002 decom

https://gerrit.wikimedia.org/r/494164

MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned

Marostegui changed the task status from Stalled to Open.Mar 4 2019, 6:53 AM
Marostegui updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2019-03-04T07:13:05Z] <marostegui> Remove dbstore1002 from tendril and zarcillo - T216491

No complaints or outages after the shutdown of dbstore1002, I think that we are good to keep going with the decom.

"@Marostegui this is control tower, you are clear to engage"

Change 494649 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbstore1002: Set it to spare

https://gerrit.wikimedia.org/r/494649

Change 494649 merged by Marostegui:
[operations/puppet@production] dbstore1002: Set it to spare

https://gerrit.wikimedia.org/r/494649

Mentioned in SAL (#wikimedia-operations) [2019-03-06T07:34:07Z] <marostegui> Remove dbstore1002 from tendril and zarcillo T216491

Marostegui updated the task description. (Show Details)

Ready for @RobH to do the next steps.
@RobH can you give this some priority for the steps that include the power down of this host? It is a trusty host and it will soon stop receiving updates, so probably good to have it down (the onsite steps are not that urgent) in order to avoid having a host with no security updates ON.
Thanks!

wmf-decommission-host was executed by robh for dbstore1002.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor

Change 494794 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom dbstore1002

https://gerrit.wikimedia.org/r/494794

Change 494795 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom dbstore1002

https://gerrit.wikimedia.org/r/494795

Change 494795 merged by RobH:
[operations/dns@master] decom dbstore1002

https://gerrit.wikimedia.org/r/494795

Change 494794 merged by RobH:
[operations/puppet@production] decom dbstore1002

https://gerrit.wikimedia.org/r/494794

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

Change 494880 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Remove mariadb::dbstore

https://gerrit.wikimedia.org/r/494880

Change 494880 merged by Marostegui:
[operations/puppet@production] mariadb: Remove mariadb::dbstore

https://gerrit.wikimedia.org/r/494880

Change 494886 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] aliases.yaml.erb: Remove dbstore

https://gerrit.wikimedia.org/r/494886

Change 494886 merged by Marostegui:
[operations/puppet@production] aliases.yaml.erb: Remove dbstore

https://gerrit.wikimedia.org/r/494886

papaul@asw2-d-eqiad# show | compare 
[edit interfaces]
-   ge-1/0/7 {
-       description dbstore1002;
-       enable;
-   }

Change 549912 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Remove mgmt DNS for analytic1003, dbstore1002 abd ms-be1027

https://gerrit.wikimedia.org/r/549912

Change 549912 merged by Papaul:
[operations/dns@master] DNS: Remove mgmt DNS for analytic1003, dbstore1002 abd ms-be1027

https://gerrit.wikimedia.org/r/549912

Papaul updated the task description. (Show Details)

Complete