Page MenuHomePhabricator

Decommission db1050
Closed, ResolvedPublic

Description

db1050 can be decommissioned

  • - all system services confirmed offline from production use: Removed from mediawiki-config: https://gerrit.wikimedia.org/r/#/c/384453/
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.):
  • Set to spare: https://gerrit.wikimedia.org/r/#/c/384455/

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port & change switch port label to asset tag
  • - remove production dns entries & remove hostname entries in mgmt dns
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - remove hostname label, remove hostname from visible label field in racktables (by onsite)
  • - system added back to spares tracking (by onsite)

Related Objects

Event Timeline

Change 384453 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1050 from config

https://gerrit.wikimedia.org/r/384453

Change 384453 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1050 from config

https://gerrit.wikimedia.org/r/384453

Mentioned in SAL (#wikimedia-operations) [2017-10-16T08:30:07Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove db1050 from config - T178162 (duration: 00m 46s)

Change 384454 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s6.hosts: Remove db1050

https://gerrit.wikimedia.org/r/384454

Mentioned in SAL (#wikimedia-operations) [2017-10-16T08:31:02Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Remove db1050 from config - T178162 (duration: 00m 46s)

Change 384455 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Remove db1050 from config

https://gerrit.wikimedia.org/r/384455

Change 384454 merged by jenkins-bot:
[operations/software@master] s6.hosts: Remove db1050

https://gerrit.wikimedia.org/r/384454

Change 384455 merged by Marostegui:
[operations/puppet@production] mariadb: Remove db1050 from config

https://gerrit.wikimedia.org/r/384455

Mentioned in SAL (#wikimedia-operations) [2017-10-16T08:46:23Z] <marostegui> Remove db1050 from tendril - T178162

Mentioned in SAL (#wikimedia-operations) [2017-10-16T08:53:52Z] <marostegui> Stop MySQL on db1050 as it will be decommissioned - T178162

Marostegui updated the task description. (Show Details)
Marostegui moved this task from Pending comment to Done on the DBA board.
Marostegui added a project: ops-eqiad.
Marostegui added a subscriber: Cmjohnson.

db1050 can now be decommissioned by @Cmjohnson
@Cmjohnson remember that one of the disks is failed, so it would be good to identify that one before decommissioning it, so we do not re-use that broken one in the future
Thanks!

@Cmjohnson This would be on the top of the db decommissioning stack (which obviously, is not that urgent) so we can get rid of the non-useful alert about the bad disk.

And let's make sure we mark that bad disk as broken so it is not re-used somewhere else :-)

Cmjohnson updated the task description. (Show Details)