Page MenuHomePhabricator

Decommission db1038
Closed, ResolvedPublic

Description

db1038's data has been copied over db1072

  • - all system services confirmed offline from production use: Removed from mediawiki-config: https://gerrit.wikimedia.org/r/#/c/389670/
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.):
  • Set to spare: https://gerrit.wikimedia.org/r/#/c/389672/

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port & change switch port label to asset tag
  • - remove production dns entries & remove hostname entries in mgmt dns
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - remove hostname label, remove hostname from visible label field in racktables (by onsite)
  • - system added back to decom rack (by onsite)

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.

db1038 can now be decommissioned.
db1072 has been pooled as vslow and dump service on s3 with db1038's data
Let's not touch it until Monday though, just to make sure db1072 works fine.

Next week I will remove db1038 from all the config and add ops-eqiad to this task.

Change 389670 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw,db-eqiad.php: Remove db1038 from config

https://gerrit.wikimedia.org/r/389670

Change 389671 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s3.hosts: Remove db1038

https://gerrit.wikimedia.org/r/389671

Change 389670 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw,db-eqiad.php: Remove db1038 from config

https://gerrit.wikimedia.org/r/389670

Change 389671 merged by jenkins-bot:
[operations/software@master] s3.hosts: Remove db1038

https://gerrit.wikimedia.org/r/389671

Mentioned in SAL (#wikimedia-operations) [2017-11-07T09:02:03Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Remove db1038 from config - T177911 (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2017-11-07T09:02:56Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove db1038 from config - T177911 (duration: 00m 45s)

Change 389672 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Get ready to decommission db1038

https://gerrit.wikimedia.org/r/389672

Change 389672 merged by Marostegui:
[operations/puppet@production] mariadb: Get ready to decommission db1038

https://gerrit.wikimedia.org/r/389672

Mentioned in SAL (#wikimedia-operations) [2017-11-07T09:13:59Z] <marostegui> Stop MySQL on db1038 - host to be decommissioned - T177911

Marostegui added a subscriber: Cmjohnson.

This host is fully ready to be decommissioned by @Cmjohnson

Wiped, racktables updated