Page MenuHomePhabricator

Decommission db1020
Closed, ResolvedPublic

Description

db1020 was the old m2 master, and it was failed over to db1051.
Wait a few days and then proceed to decommission it

Decommission Checklist

  • - all system services confirmed offline from production use - should be done by DBA team : https://gerrit.wikimedia.org/r/#/c/420958/
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration - should be done by DBA team
  • - any service group puppet/heira/dsh config removed - should be done by DBA team
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.) - should be done by DBA team: https://gerrit.wikimedia.org/r/#/c/420956/

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (ge-1/0/4)
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Related Objects

Event Timeline

Marostegui triaged this task as Normal priority.Mar 15 2018, 12:31 PM
Marostegui created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 15 2018, 12:31 PM
jcrespo renamed this task from Decommission 1020 to Decommission db1020.Mar 15 2018, 12:38 PM

I have checksummed m2 and it is fine. We can proceed and decomm this server once the weekend has passed and we are sure the master is fine.

Marostegui updated the task description. (Show Details)Mar 16 2018, 3:29 PM

Mentioned in SAL (#wikimedia-operations) [2018-03-16T16:09:45Z] <marostegui> Stop MySQL on db1020 - T189773

Change 420061 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] dbproxy100[2,7]: Change sby host

https://gerrit.wikimedia.org/r/420061

There is a logical backup of db1020 at: es2001:/srv/backups/older/m2/db1020/dump.m2.2018-03-16--16-20-10
So this host can now go ahead and get decommissioned

Change 420061 merged by Marostegui:
[operations/puppet@production] dbproxy100[2,7]: Change standby host

https://gerrit.wikimedia.org/r/420061

Mentioned in SAL (#wikimedia-operations) [2018-03-19T07:27:35Z] <marostegui> Reload dbproxy1002 and dbproxy1007 to get the new config - T189773

Mentioned in SAL (#wikimedia-operations) [2018-03-21T06:50:51Z] <marostegui> Stop MySQL on db1020 - T189773

Change 420956 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Get ready to decommission db1020

https://gerrit.wikimedia.org/r/420956

Change 420958 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1020

https://gerrit.wikimedia.org/r/420958

Change 420958 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1020

https://gerrit.wikimedia.org/r/420958

Mentioned in SAL (#wikimedia-operations) [2018-03-21T07:01:13Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove db1020 from config - T189773 (duration: 01m 13s)

Mentioned in SAL (#wikimedia-operations) [2018-03-21T07:02:55Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Remove db1020 from config - T189773 (duration: 01m 15s)

Marostegui updated the task description. (Show Details)Mar 21 2018, 7:03 AM

Change 420956 merged by Marostegui:
[operations/puppet@production] mariadb: Get ready to decommission db1020

https://gerrit.wikimedia.org/r/420956

Marostegui updated the task description. (Show Details)Mar 21 2018, 7:07 AM

Mentioned in SAL (#wikimedia-operations) [2018-03-21T07:07:48Z] <marostegui> Remove db1020 from tendril - T189773

Marostegui reassigned this task from Marostegui to RobH.Mar 21 2018, 7:09 AM
Marostegui moved this task from In progress to Done on the DBA board.

This host is now ready for DC Ops steps. Assigning to @RobH

Restricted Application added a project: Operations. · View Herald TranscriptMar 21 2018, 7:09 AM

Change 420960 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] m2.hosts: Remove db1020

https://gerrit.wikimedia.org/r/420960

Change 420960 merged by jenkins-bot:
[operations/software@master] m2.hosts: Remove db1020

https://gerrit.wikimedia.org/r/420960

Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Mar 28 2018, 5:53 PM

Change 423477 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Removing db1020 site.pp entry

https://gerrit.wikimedia.org/r/423477

Change 423477 merged by Cmjohnson:
[operations/puppet@production] Removing db1020 site.pp entry

https://gerrit.wikimedia.org/r/423477

Change 423480 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removing db1020 dns entries

https://gerrit.wikimedia.org/r/423480

Change 423480 merged by Cmjohnson:
[operations/dns@master] Removing db1020 dns entries

https://gerrit.wikimedia.org/r/423480

Cmjohnson updated the task description. (Show Details)Apr 2 2018, 3:19 PM
Cmjohnson moved this task from Decommission to Up next on the ops-eqiad board.
Marostegui reassigned this task from RobH to Cmjohnson.Apr 4 2018, 10:20 AM

Assigning to Chris to reflect the latest work that was done for this host

Cmjohnson closed this task as Resolved.Apr 4 2018, 7:04 PM
Cmjohnson updated the task description. (Show Details)