Page MenuHomePhabricator

Decommission db2010 and move m1 codfw to db2078
Closed, ResolvedPublic

Description

  • - all system services confirmed offline from production use: Removed from mediawiki-config: https://gerrit.wikimedia.org/r/#/c/382123/
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.):
  • Set to spare: https://gerrit.wikimedia.org/r/#/c/378976/

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port
  • - remove production dns entries & remove hostname entries in mgmt dns
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - remove hostname label, remove hostname from visible label field in racktables (by onsite)
  • - system added back to decommission sheet (by onsite)
  • - remove switch port description

Details

Related Gerrit Patches:

Event Timeline

jcrespo created this task.Sep 12 2017, 1:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 12 2017, 1:14 PM

Change 377460 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] Add new m1 host db2078, enable firewall on all misc services

https://gerrit.wikimedia.org/r/377460

jcrespo moved this task from Triage to Backlog on the DBA board.Sep 12 2017, 4:06 PM

@jcrespo this is assigned to me is there anything i have to do on my side?

Thanks.

jcrespo removed Papaul as the assignee of this task.Sep 13 2017, 2:17 PM
jcrespo added a subscriber: Papaul.

I think the assignment is an accident because it was created as a subticket of another ticket; nothing to do here yet for you. Sorry for the distraction.

Change 377460 merged by Jcrespo:
[operations/puppet@production] Add new m1 host db2078, enable firewall on all misc services

https://gerrit.wikimedia.org/r/377460

Change 378962 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] icinga: Disable notifications on db2078, enable them on db1101

https://gerrit.wikimedia.org/r/378962

Change 378962 merged by Jcrespo:
[operations/puppet@production] icinga: Disable notifications on db2078, enable them on db1101

https://gerrit.wikimedia.org/r/378962

Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts:

['db2078.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201709191713_jynus_21956.log.

Completed auto-reimage of hosts:

['db2078.codfw.wmnet']

and were ALL successful.

Change 378976 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Remove all references to db2010 on production

https://gerrit.wikimedia.org/r/378976

Change 378976 merged by Jcrespo:
[operations/puppet@production] mariadb: Remove all references to db2010 on production

https://gerrit.wikimedia.org/r/378976

Change 378977 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dbtools: Remove db2010, add db2078 from m1 dblist

https://gerrit.wikimedia.org/r/378977

Change 378977 merged by Jcrespo:
[operations/software@master] dbtools: Remove db2010, add db2078 from m1 dblist

https://gerrit.wikimedia.org/r/378977

Can db2010 be decommissioned?

Marostegui updated the task description. (Show Details)Oct 4 2017, 5:35 AM

Change 382123 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2010 from config

https://gerrit.wikimedia.org/r/382123

Marostegui updated the task description. (Show Details)Oct 4 2017, 7:02 AM

Change 382123 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db2010 from config

https://gerrit.wikimedia.org/r/382123

Mentioned in SAL (#wikimedia-operations) [2017-10-04T07:10:35Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove db2010 from config as it will be decommissioned - T175685 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-operations) [2017-10-04T07:11:30Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Remove db2010 from config as it will be decommissioned - T175685 (duration: 00m 48s)

Marostegui updated the task description. (Show Details)Oct 4 2017, 7:15 AM
Marostegui updated the task description. (Show Details)Oct 4 2017, 7:17 AM

Change 382127 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Remove db2010

https://gerrit.wikimedia.org/r/382127

Change 382127 merged by Marostegui:
[operations/puppet@production] install_server: Remove db2010

https://gerrit.wikimedia.org/r/382127

Mentioned in SAL (#wikimedia-operations) [2017-10-04T08:28:52Z] <marostegui> Stop MySQL on db2010 as it will be decommissioned - T175685

Marostegui assigned this task to Papaul.Oct 4 2017, 8:29 AM
Marostegui updated the task description. (Show Details)
Marostegui moved this task from Backlog to Done on the DBA board.
Marostegui edited projects, added ops-codfw; removed Patch-For-Review.

db2010 is ready to be fully decommissioned by @Papaul

Restricted Application added a project: Operations. · View Herald TranscriptOct 4 2017, 8:30 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-04T15:28:46Z] <marostegui> Disable puppet on db2010 - it will be decommissioned - T175685

Marostegui updated the task description. (Show Details)Oct 4 2017, 3:29 PM

Change 382171 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] site.pp: Remove db2010

https://gerrit.wikimedia.org/r/382171

Change 382171 merged by Marostegui:
[operations/puppet@production] site.pp: Remove db2010

https://gerrit.wikimedia.org/r/382171

Marostegui updated the task description. (Show Details)Oct 4 2017, 3:34 PM

Mentioned in SAL (#wikimedia-operations) [2017-10-04T15:35:58Z] <marostegui> Power off db2010 to decommission it - T175685

Marostegui updated the task description. (Show Details)Oct 4 2017, 3:37 PM

Change 382205 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS:Remove production & mgmt DNS for db2010

https://gerrit.wikimedia.org/r/382205

Papaul added a comment.Oct 5 2017, 3:02 PM

Disk wipe in progress .

Papaul added a comment.Oct 5 2017, 3:40 PM

switch port information

asw-a2-codfw ge-6/0/9

Change 382205 merged by Dzahn:
[operations/dns@master] Remove production & mgmt DNS for db2010

https://gerrit.wikimedia.org/r/382205

Papaul updated the task description. (Show Details)Oct 10 2017, 3:50 PM
Papaul reassigned this task from Papaul to RobH.Oct 10 2017, 3:55 PM

Hi,

Is there anything pending here?

Thanks!

RobH closed this task as Resolved.EditedNov 3 2017, 4:50 PM

Whoever went ahead and started the steps marked 'non interruptable' and skipped the switch port disable, please do not do that again.

The switch port MUST BE DISABLED before we move on, to prevent issues. This is part of the steps for a reason; if it powers on but isn't wiped, it can cause issues when it calls into puppet and assumes it should be online. In the future, please complete those steps in order, thanks!

RobH updated the task description. (Show Details)Nov 3 2017, 5:18 PM
RobH edited projects, added hardware-requests; removed Patch-For-Review.