Page MenuHomePhabricator

Decommission db1054
Closed, ResolvedPublic

Description

db1054 was s2 primary master and was failed over to db1066 (T194870)

Let's wait a couple of days before decommissioning

  • Set up a new candidate master for s2 - db1076
  • Compare data between db1054 and db1076

Decommission Checklist

  • - all system services confirmed offline from production use - should be done by DBA team
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration - should be done by DBA team
  • - any service group puppet/heira/dsh config removed - should be done by DBA team
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.) - should be done by DBA team: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/442014/

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-a-eqiad:ge-3/0/32
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.
  • - IF RECLAIM: system added back to spares tracking (by onsite)

Details

Related Gerrit Patches:
operations/puppet : productiondecom db1054 from repo
operations/dns : masterdecom db1054
operations/mediawiki-config : masterdb-eqiad,db-codfw.php: Remove db1054
operations/puppet : productionmariadb: Set db1054 as spare
operations/mediawiki-config : masterdb-eqiad.php: Depool db1076

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 13 2018, 6:37 AM
Marostegui triaged this task as Medium priority.Jun 13 2018, 6:38 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Change 440068 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1076

https://gerrit.wikimedia.org/r/440068

Change 440068 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1076

https://gerrit.wikimedia.org/r/440068

Mentioned in SAL (#wikimedia-operations) [2018-06-13T08:04:17Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Depool db1076 for binlog change - T197063 (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2018-06-13T08:04:36Z] <marostegui> Stop MySQL and reboot db1076 - T197063

Marostegui updated the task description. (Show Details)Jun 13 2018, 8:11 AM
Marostegui updated the task description. (Show Details)Jun 13 2018, 1:49 PM

main tables have been checked without any differences.

Marostegui updated the task description. (Show Details)Jun 13 2018, 1:50 PM

Change 442014 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Set db1054 as spare

https://gerrit.wikimedia.org/r/442014

Change 442014 merged by Marostegui:
[operations/puppet@production] mariadb: Set db1054 as spare

https://gerrit.wikimedia.org/r/442014

Mentioned in SAL (#wikimedia-operations) [2018-06-26T05:52:12Z] <marostegui> Stop MySQL on db1054 as it is going to be decommissioned - T197063

Change 442015 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1054

https://gerrit.wikimedia.org/r/442015

Change 442015 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1054

https://gerrit.wikimedia.org/r/442015

Mentioned in SAL (#wikimedia-operations) [2018-06-26T05:57:16Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Remove db1054, it is going to be decommissioned T197063 (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2018-06-26T05:58:17Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Remove db1054, it is going to be decommissioned T197063 (duration: 00m 55s)

Marostegui updated the task description. (Show Details)
Marostegui moved this task from In progress to Done on the DBA board.

db1054 is now ready to be handed over to DCOps for its decommissioning

Restricted Application added a project: Operations. · View Herald TranscriptJun 26 2018, 6:00 AM
Vvjjkkii renamed this task from Decommission db1054 to c5aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii removed Cmjohnson as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii edited subscribers, added: Cmjohnson; removed: gerritbot, Aklapper.
Marostegui renamed this task from c5aaaaaaaa to Decommission db1054.Jul 1 2018, 8:11 PM
Marostegui assigned this task to Cmjohnson.
Marostegui lowered the priority of this task from High to Medium.
Marostegui updated the task description. (Show Details)
Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Jul 2 2018, 4:16 PM
RobH removed Cmjohnson as the assignee of this task.Jul 13 2018, 8:28 PM
RobH updated the task description. (Show Details)Jul 13 2018, 8:37 PM

Change 445722 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom db1054

https://gerrit.wikimedia.org/r/445722

Change 445723 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom db1054 from repo

https://gerrit.wikimedia.org/r/445723

Change 445722 merged by RobH:
[operations/dns@master] decom db1054

https://gerrit.wikimedia.org/r/445722

Change 445723 merged by RobH:
[operations/puppet@production] decom db1054 from repo

https://gerrit.wikimedia.org/r/445723

RobH assigned this task to Cmjohnson.Jul 13 2018, 8:41 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to pending onsite steps (eqiad) on the decommission board.
Cmjohnson closed this task as Resolved.Aug 7 2018, 5:05 PM
Cmjohnson updated the task description. (Show Details)