Page MenuHomePhabricator

Decommission db1053
Closed, ResolvedPublic

Description

db1053 will be substituted by db1072, then it can be fully decommission.

We still need to:

  • Add pending grants/data to db1073
  • Change m3-slave CNAME to db1073
  • Move backups to db1073
  • Change failover candidate on proxies
  • Failover m3 master to db1072
  • Decommission, too, db1059 T196606

Decommission Checklist

  • - all system services confirmed offline from production use - should be done by DBA team set as spare https://gerrit.wikimedia.org/r/440140
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place. (disabled alerts)
  • - remove system from all lvs/pybal active configuration - should be done by DBA team not in dblists
  • - any service group puppet/heira/dsh config removed - should be done by DBA team not in hiera
  • - remove site.pp (replace with role(spare::system) if system isn't shut down immediately during this process.) - should be done by DBA team: https://gerrit.wikimedia.org/r/440140

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - disable switch port
  • - switch port assignment noted on this task (for later removal) asw-a-eqiad:ge-2/0/9
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate
  • - remove dbmonitor entries on neodymium/sarin: sudo curl -X DELETE https://debmonitor.discovery.wmnet/hosts/${HOST_FQDN} --cert /etc/debmonitor/ssl/cert.pem --key /etc/debmonitor/ssl/server.key

END NON-INTERRUPPTABLE STEPS

  • - mark disk #3 as non usable - must be degaussed for erasure - it has smart errors
  • - mark disk #8 as non usable - must be degaussed for erasure - it has smart errors
  • - mark disk #10 as non usable - must be degaussed for erasure - it has smart errors
  • - system disks wiped (by onsite)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Event Timeline

jcrespo triaged this task as Medium priority.May 14 2018, 9:48 AM
jcrespo created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 14 2018, 9:48 AM

Change 433141 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Move m3 backups from db1053 to db1072

https://gerrit.wikimedia.org/r/433141

Change 433141 merged by Jcrespo:
[operations/puppet@production] mariadb: Move m3 backups from db1053 to db1072

https://gerrit.wikimedia.org/r/433141

Grants seem fixed. I am not moving the following ones, as they may be unused:

  • fabmigrate
  • bzmigrate
  • rtmigrate

Change 433175 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/dns@master] mariadb: Move m3-slave from db1053 to db1072

https://gerrit.wikimedia.org/r/433175

Change 433175 merged by Jcrespo:
[operations/dns@master] mariadb: Move m3-slave from db1053 to db1072

https://gerrit.wikimedia.org/r/433175

Change 433180 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Make db1072, and not db1053, the passive m3 failover

https://gerrit.wikimedia.org/r/433180

Change 433180 merged by Jcrespo:
[operations/puppet@production] mariadb: Make db1072, and not db1053, the passive m3 failover

https://gerrit.wikimedia.org/r/433180

jcrespo updated the task description. (Show Details)May 15 2018, 5:06 PM
jcrespo added a subscriber: mmodell.

@mmodell upcoming failover of Phabricator database, heads up (no action needed from you).

Marostegui moved this task from Triage to In progress on the DBA board.May 16 2018, 10:16 AM

Let's make sure we label this disk, somehow, as broken when we decommission this host - so it is not reused in the future to replace other disks:

Enclosure Device ID: 32
			Slot Number: 10
jcrespo updated the task description. (Show Details)Jun 7 2018, 6:45 AM
jcrespo updated the task description. (Show Details)
jcrespo added a subscriber: RobH.

Change 438004 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dblists: Remove db1059 and db1053 for decommission

https://gerrit.wikimedia.org/r/438004

Change 438004 merged by Jcrespo:
[operations/software@master] dblists: Remove db1059 and db1053 for decommission

https://gerrit.wikimedia.org/r/438004

Marostegui updated the task description. (Show Details)Jun 12 2018, 10:37 AM
Marostegui updated the task description. (Show Details)

Change 440140 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Remove refereces to db1053 and db1059 and set them as spare

https://gerrit.wikimedia.org/r/440140

Mentioned in SAL (#wikimedia-operations) [2018-06-13T15:25:27Z] <jynus> stopping db1053 and db1059 in preparation for decomm T194634 T196606

Change 440140 merged by Jcrespo:
[operations/puppet@production] mariadb: Remove references to db1053 and db1059 and set them as spare

https://gerrit.wikimedia.org/r/440140

jcrespo assigned this task to RobH.Jun 13 2018, 4:44 PM
jcrespo updated the task description. (Show Details)
jcrespo moved this task from In progress to Done on the DBA board.
jcrespo edited projects, added decommission-hardware; removed Patch-For-Review.
Vvjjkkii renamed this task from Decommission db1053 to t0caaaaaaa.Jul 1 2018, 1:10 AM
Vvjjkkii removed RobH as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Marostegui renamed this task from t0caaaaaaa to Decommission db1053.Jul 1 2018, 6:44 PM
Marostegui assigned this task to Cmjohnson.
Marostegui lowered the priority of this task from High to Medium.
Marostegui updated the task description. (Show Details)
CommunityTechBot reassigned this task from Cmjohnson to RobH.Jul 5 2018, 6:41 PM
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Cmjohnson.
RobH updated the task description. (Show Details)Jul 19 2018, 8:25 PM

Change 446903 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom prod dns for db1053

https://gerrit.wikimedia.org/r/446903

Change 446903 merged by RobH:
[operations/dns@master] decom prod dns for db1053

https://gerrit.wikimedia.org/r/446903

Change 446904 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom db1053

https://gerrit.wikimedia.org/r/446904

Change 446904 merged by RobH:
[operations/puppet@production] decom db1053

https://gerrit.wikimedia.org/r/446904

RobH reassigned this task from RobH to Cmjohnson.Jul 19 2018, 8:36 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH moved this task from Backlog to pending onsite steps (eqiad) on the decommission-hardware board.
RobH added a project: ops-eqiad.
Restricted Application added a project: Operations. · View Herald TranscriptJul 19 2018, 8:37 PM
RobH moved this task from Backlog to Decommission on the ops-eqiad board.Jul 19 2018, 8:37 PM
Cmjohnson closed this task as Resolved.Aug 7 2018, 5:04 PM
Cmjohnson updated the task description. (Show Details)