Page MenuHomePhabricator

Decommission db2057.codfw.wmnet
Open, NormalPublic

Description

This task will track the decommission of server db2057.codfw.wmnet

The first 5 steps should be completed by the service owner that is returning the server to DC-ops (for reclaim to spare or decommissioning, dependent on server configuration and age.)

db2057
Steps for service owner:

  • - all system services confirmed offline from production use
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration https://gerrit.wikimedia.org/r/#/c/529912/
  • - any service group puppet/hiera/dsh config removed
  • - remove site.pp, replace with role(spare::system) https://gerrit.wikimedia.org/r/#/c/529915/
  • - unassign service owner from this task, check off completed steps, and assign to @RobH for followup on below steps.

Steps for DC-Ops:

The following steps cannot be interrupted, as it will leave the system in an unfinished state.

Start non-interrupt steps:

  • - disable puppet on host
  • - power down host
  • - update netbox status to Inventory (if decom) or Planned (if spare)
  • - disable switch port
  • - switch port assignment noted on this task (for later removal)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate (handled by wmf-decommission-host)
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key (handled by wmf-decommission-host)

End non-interrupt steps.

  • - Label the controller as possibly broken so it doesn't get re-used T212275: db2057 storage crashed
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - IF RECLAIM: system added back to spares tracking (by onsite)

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.
Marostegui triaged this task as Normal priority.Tue, Aug 13, 7:35 AM
Marostegui updated the task description. (Show Details)

Change 529912 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2057 from config

https://gerrit.wikimedia.org/r/529912

Change 529912 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2057 from config

https://gerrit.wikimedia.org/r/529912

Mentioned in SAL (#wikimedia-operations) [2019-08-13T07:54:08Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db2057 from config T230394 (duration: 00m 48s)

Marostegui updated the task description. (Show Details)Tue, Aug 13, 7:54 AM

Mentioned in SAL (#wikimedia-operations) [2019-08-13T07:55:03Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db2057 from config T230394 (duration: 00m 47s)

Change 529915 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Decommission db2057

https://gerrit.wikimedia.org/r/529915

Marostegui updated the task description. (Show Details)Tue, Aug 13, 8:46 AM

Change 529915 merged by Marostegui:
[operations/puppet@production] mariadb: Decommission db2057

https://gerrit.wikimedia.org/r/529915

Mentioned in SAL (#wikimedia-operations) [2019-08-13T08:48:45Z] <marostegui> Remove db2057 from tendril and zarcillo T230394

Mentioned in SAL (#wikimedia-operations) [2019-08-13T08:49:40Z] <marostegui> Stop MySQL on db2057 - T230394

Marostegui reassigned this task from Marostegui to RobH.Tue, Aug 13, 8:50 AM
Marostegui edited projects, added ops-codfw; removed Patch-For-Review, DBA.
Marostegui updated the task description. (Show Details)
Marostegui added a project: DC-Ops.

This host is ready for DC-Ops to decommission

Papaul moved this task from Backlog to Decommission on the ops-codfw board.Fri, Aug 16, 3:42 PM