Page MenuHomePhabricator

Decommission db1043
Closed, ResolvedPublic

Description

db1043 has been substituted by db1053. After one week, db1053 is working properly, we are free to decommission it.

Decommission Checklist

START NON-INTERRUPPTABLE STEPS - please assign to @RobH for the non-interrupt steps

  • - disable puppet on host
  • - power down host
  • - disable switch port - CANNOT DO AS SYSTEM IS NOT LABELED ON SWITCH also the mac doesn't show up in the switching table
  • - switch port assignment noted on this task (for later removal) cannot do, but needs to be traced and set to disabled by @Cmjohnson
  • - remove all remaining puppet references (include role::spare)
  • - remove production dns entries
  • - puppet node clean, puppet node deactivate

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - onsite trace and disable the network port (already has been unplugged)
  • - IF DECOM: system unracked and decommissioned (by onsite), update racktables with result
  • - IF DECOM: switch port configration removed from switch once system is unracked.
  • - IF DECOM: add system to decommission tracking google sheet
  • - IF DECOM: mgmt dns entries removed.

Related Objects

Event Timeline

jcrespo triaged this task as Normal priority.Feb 16 2018, 12:53 PM
jcrespo created this task.

Change 411229 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Mariadb: Schedule db1043 and db2012 for decommission

https://gerrit.wikimedia.org/r/411229

Change 411229 merged by Jcrespo:
[operations/puppet@production] Mariadb: Schedule db1043 and db2012 for decommission

https://gerrit.wikimedia.org/r/411229

Cmjohnson moved this task from Backlog to Decommission on the ops-eqiad board.Feb 16 2018, 3:47 PM
RobH added a project: DBA.Feb 16 2018, 5:37 PM
RobH updated the task description. (Show Details)

Since this is pending the DBA team's work on stating the new host is online, I've appended in the DBA flag. Once the DBA team work is done (their sign off its ok to continue and the replacement is working & the decom steps flagged for them due to service interactions), this can be assigned to @RobH and the DBA team flag removed from the task.

(If anything of the above is incorrect in the use of the DBA tag, my apologies! It seemed the best way to ensure this stays on radar without directly pinging anyone.)

The usage of the tafs is ok.

Note the substitution host is already online and in production, and the old hosts set as spare. What we wanted to to wait, e.g. 7 days to "assign" it to you (but you had and advance heads up, if that helps with something.) in the 1 in a million case that the data copied was not done properly twice, backups were unusable, and new host that has been there for months also broke, and we had to put the old servers back into production. So let's say to proceed with the decommission on the 26th February or later unless you hear back from us. In theory decomm could start already except for the un-doable disk wipe.

Marostegui moved this task from Triage to In progress on the DBA board.Feb 21 2018, 4:14 PM
jcrespo reassigned this task from jcrespo to RobH.Feb 23 2018, 5:49 PM
jcrespo removed a project: Patch-For-Review.
jcrespo updated the task description. (Show Details)
jcrespo updated the task description. (Show Details)
jcrespo moved this task from In progress to Done on the DBA board.
RobH updated the task description. (Show Details)Mar 14 2018, 5:17 PM

Change 419487 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] decom db1043

https://gerrit.wikimedia.org/r/419487

Change 419488 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] decom db1043

https://gerrit.wikimedia.org/r/419488

RobH reassigned this task from RobH to Cmjohnson.Mar 14 2018, 5:27 PM

Please note that the switch port for this host was not labeled & doesn't show in ethernet switching table. So @Cmjohnson went ahead and unplugged it for me, so we don't have any issues of a decom host coming back online before wipe.

When he goes to do the disk wipe, he will need to trace and disable the network port.

Change 419487 merged by RobH:
[operations/dns@master] decom db1043

https://gerrit.wikimedia.org/r/419487

Change 419488 merged by RobH:
[operations/puppet@production] decom db1043

https://gerrit.wikimedia.org/r/419488

RobH updated the task description. (Show Details)

Change 421560 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Removal of mgmt dns db1043

https://gerrit.wikimedia.org/r/421560

Change 421560 merged by Cmjohnson:
[operations/dns@master] Removal of mgmt dns db1043

https://gerrit.wikimedia.org/r/421560

Cmjohnson closed this task as Resolved.Mar 23 2018, 4:21 PM
Cmjohnson updated the task description. (Show Details)