Page MenuHomePhabricator

Decommission dbstore1002
Open, NormalPublic

Description

dbstore1002 has been migrated to the new hosts (T210478) and thus can be decommissioned

Wait for the green light - dbstore1002 is still being used (for reads)

  • Analytics to confirm this host can be decommissioned

dbstore1002

Decommission Checklist

  • - all system services confirmed offline from production use - should be done by Analytics team
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration - should be done by Analytics team: Removed from config:
  • - any service group puppet/heira/dsh config removed - should be done by Analytics team
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.) - should be done by Analytics team: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/494649/

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - update status in netbox (inventory for decom, planned for spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw2-d-eqiad:ge-1/0/7
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Related Objects

Event Timeline

Marostegui changed the task status from Open to Stalled.
Marostegui triaged this task as Normal priority.
Milimetric moved this task from Incoming to Radar on the Analytics board.Feb 21 2019, 5:46 PM
Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Feb 21 2019, 6:42 PM

Change 492275 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] analytics-grants.sql: Remove file

https://gerrit.wikimedia.org/r/492275

Change 492275 merged by Marostegui:
[operations/puppet@production] analytics-grants.sql: Remove file

https://gerrit.wikimedia.org/r/492275

Change 493647 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Remove dbstore1002

https://gerrit.wikimedia.org/r/493647

Change 493647 merged by Marostegui:
[operations/puppet@production] install_server: Remove dbstore1002

https://gerrit.wikimedia.org/r/493647

Change 494164 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/dns@master] Delete analytics-store CNAME due to dbstore1002 decom

https://gerrit.wikimedia.org/r/494164

Change 494164 merged by Elukey:
[operations/dns@master] Delete analytics-store CNAME due to dbstore1002 decom

https://gerrit.wikimedia.org/r/494164

MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned

Marostegui updated the task description. (Show Details)Mar 4 2019, 6:53 AM
Marostegui changed the task status from Stalled to Open.

Mentioned in SAL (#wikimedia-operations) [2019-03-04T07:13:05Z] <marostegui> Remove dbstore1002 from tendril and zarcillo - T216491

elukey updated the task description. (Show Details)EditedMar 6 2019, 7:15 AM

No complaints or outages after the shutdown of dbstore1002, I think that we are good to keep going with the decom.

"@Marostegui this is control tower, you are clear to engage"

\o/
Clear to decommission! :)

Change 494649 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbstore1002: Set it to spare

https://gerrit.wikimedia.org/r/494649

Change 494649 merged by Marostegui:
[operations/puppet@production] dbstore1002: Set it to spare

https://gerrit.wikimedia.org/r/494649

Mentioned in SAL (#wikimedia-operations) [2019-03-06T07:34:07Z] <marostegui> Remove dbstore1002 from tendril and zarcillo T216491

Marostegui updated the task description. (Show Details)Mar 6 2019, 7:37 AM
Marostegui assigned this task to RobH.

Ready for @RobH to do the next steps.
@RobH can you give this some priority for the steps that include the power down of this host? It is a trusty host and it will soon stop receiving updates, so probably good to have it down (the onsite steps are not that urgent) in order to avoid having a host with no security updates ON.
Thanks!

RobH edited projects, added decommission; removed Patch-For-Review.Mar 6 2019, 4:08 PM
RobH moved this task from Backlog to Ready for Decommission on the decommission board.

wmf-decommission-host was executed by robh for dbstore1002.eqiad.wmnet and performed the following actions:

  • Revoked Puppet certificate
  • Removed from PuppetDB
  • Downtimed host on Icinga
  • Downtimed mgmt interface on Icinga
  • Removed from DebMonitor
RobH updated the task description. (Show Details)Mar 6 2019, 5:41 PM

Change 494794 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom dbstore1002

https://gerrit.wikimedia.org/r/494794

Change 494795 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom dbstore1002

https://gerrit.wikimedia.org/r/494795

Change 494795 merged by RobH:
[operations/dns@master] decom dbstore1002

https://gerrit.wikimedia.org/r/494795

Change 494794 merged by RobH:
[operations/puppet@production] decom dbstore1002

https://gerrit.wikimedia.org/r/494794

RobH reassigned this task from RobH to Cmjohnson.Mar 6 2019, 5:46 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)

Change 494880 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Remove mariadb::dbstore

https://gerrit.wikimedia.org/r/494880

Change 494880 merged by Marostegui:
[operations/puppet@production] mariadb: Remove mariadb::dbstore

https://gerrit.wikimedia.org/r/494880

Change 494886 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] aliases.yaml.erb: Remove dbstore

https://gerrit.wikimedia.org/r/494886

Change 494886 merged by Marostegui:
[operations/puppet@production] aliases.yaml.erb: Remove dbstore

https://gerrit.wikimedia.org/r/494886