Page MenuHomePhabricator

Decommission db1020
Closed, ResolvedPublic

Description

db1020 was the old m2 master, and it was failed over to db1051.
Wait a few days and then proceed to decommission it

Decommission Checklist

  • - all system services confirmed offline from production use - should be done by DBA team : https://gerrit.wikimedia.org/r/#/c/420958/
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration - should be done by DBA team
  • - any service group puppet/heira/dsh config removed - should be done by DBA team
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.) - should be done by DBA team: https://gerrit.wikimedia.org/r/#/c/420956/

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (ge-1/0/4)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
StalledNone
OpenNone
Resolvedjcrespo
OpenNone
OpenNone
OpenNone
ResolvedNone
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
ResolvedMarostegui
ResolvedRobH
ResolvedAndrew
ResolvedCmjohnson
Resolvedjcrespo
ResolvedCmjohnson
ResolvedCmjohnson
Resolvedjcrespo
ResolvedCmjohnson

Event Timeline

Marostegui created this task.
jcrespo renamed this task from Decommission 1020 to Decommission db1020.Mar 15 2018, 12:38 PM

I have checksummed m2 and it is fine. We can proceed and decomm this server once the weekend has passed and we are sure the master is fine.

Change 420061 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy100[2,7]: Change sby host

https://gerrit.wikimedia.org/r/420061

There is a logical backup of db1020 at: es2001:/srv/backups/older/m2/db1020/dump.m2.2018-03-16--16-20-10
So this host can now go ahead and get decommissioned

Change 420061 merged by Marostegui:
[operations/puppet@production] dbproxy100[2,7]: Change standby host

https://gerrit.wikimedia.org/r/420061

Mentioned in SAL (#wikimedia-operations) [2018-03-19T07:27:35Z] <marostegui> Reload dbproxy1002 and dbproxy1007 to get the new config - T189773

Change 420956 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Get ready to decommission db1020

https://gerrit.wikimedia.org/r/420956

Change 420958 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1020

https://gerrit.wikimedia.org/r/420958

Change 420958 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1020

https://gerrit.wikimedia.org/r/420958

Mentioned in SAL (#wikimedia-operations) [2018-03-21T07:01:13Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove db1020 from config - T189773 (duration: 01m 13s)

Mentioned in SAL (#wikimedia-operations) [2018-03-21T07:02:55Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Remove db1020 from config - T189773 (duration: 01m 15s)

Change 420956 merged by Marostegui:
[operations/puppet@production] mariadb: Get ready to decommission db1020

https://gerrit.wikimedia.org/r/420956

Mentioned in SAL (#wikimedia-operations) [2018-03-21T07:07:48Z] <marostegui> Remove db1020 from tendril - T189773

Marostegui moved this task from In progress to Done on the DBA board.

This host is now ready for DC Ops steps. Assigning to @RobH

Change 420960 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] m2.hosts: Remove db1020

https://gerrit.wikimedia.org/r/420960

Change 420960 merged by jenkins-bot:
[operations/software@master] m2.hosts: Remove db1020

https://gerrit.wikimedia.org/r/420960

Change 423477 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Removing db1020 site.pp entry

https://gerrit.wikimedia.org/r/423477

Change 423477 merged by Cmjohnson:
[operations/puppet@production] Removing db1020 site.pp entry

https://gerrit.wikimedia.org/r/423477

Change 423480 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing db1020 dns entries

https://gerrit.wikimedia.org/r/423480

Change 423480 merged by Cmjohnson:
[operations/dns@master] Removing db1020 dns entries

https://gerrit.wikimedia.org/r/423480

Cmjohnson moved this task from Decommission to Up next on the ops-eqiad board.

Assigning to Chris to reflect the latest work that was done for this host

Cmjohnson updated the task description. (Show Details)