Page MenuHomePhabricator

Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029
Closed, ResolvedPublic

Description

For core hosts lower than db2030 that used to be masters (not misc hosts, which will be handled on a separate ticket):

  • db2016
  • db2017
  • db2018
  • db2019
  • db2023
  • db2028
  • db2029
  • Ops steps
    • All system services confirmed offline from production use (MySQL is now stopped)
    • Set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
    • Remove system from all lvs/pybal active configuration
    • Any service group puppet/hiera/dsh config removed
    • Update site.pp with role::spare::system
  • non interrupt steps
    • Disable puppet on host
    • Remove all remaining puppet references (include role::spare:system)
    • Power down host
    • Disable switch port
    • Switch port assignment noted on this task (for later removal)
    • Remove production dns entries
    • Puppet node clean, puppet node deactivate
  • final decom steps
    • System disks wiped (by onsite)
    • System unracked and decommissioned (by onsite), update racktables with result
    • Switch port configration removed from switch once system is unracked.
    • Mgmt dns entries removed.

Details

Event Timeline

jcrespo triaged this task as Medium priority.Jan 3 2018, 5:50 PM
jcrespo created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 3 2018, 5:50 PM

Change 401765 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Decommission db2028- retire from mediawiki config

https://gerrit.wikimedia.org/r/401765

Change 401766 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Decommission db2028 - set as spare

https://gerrit.wikimedia.org/r/401766

Change 401767 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/software@master] dblist: Remove db2028 for decommission

https://gerrit.wikimedia.org/r/401767

s6 was already checksummed as far as I remember

Change 401767 merged by Jcrespo:
[operations/software@master] dblist: Remove db2028 for decommission

https://gerrit.wikimedia.org/r/401767

s6 was already checksummed as far as I remember

That is correct: T160509 but who says I didn't break it since then, specially on codfw?

s6 was already checksummed as far as I remember

That is correct: T160509 but who says I didn't break it since then, specially on codfw?

Always a possibility yes :)

Change 401765 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Decommission db2028- retire from mediawiki config

https://gerrit.wikimedia.org/r/401765

Change 401766 merged by Jcrespo:
[operations/puppet@production] mariadb: Decommission db2028 - set as spare

https://gerrit.wikimedia.org/r/401766

Marostegui moved this task from Triage to Next on the DBA board.Jan 5 2018, 6:35 AM

Change 402599 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Remove comments about partitioning on db2039

https://gerrit.wikimedia.org/r/402599

Change 402599 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Remove comments about partitioning on db2039

https://gerrit.wikimedia.org/r/402599

jcrespo renamed this task from Decommission db2028 to Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029.Jan 15 2018, 5:18 PM
jcrespo claimed this task.
jcrespo removed a project: Patch-For-Review.
jcrespo updated the task description. (Show Details)
jcrespo moved this task from Next to In progress on the DBA board.

I am performing a quick check on these hosts (and at the same time, testing and improving compare.py) to double check there is no data loss before decommissioning them. I am only checking:

archive ar_id
page page_id
revision rev_id
text old_id
user user_id

plus wb_terms and image/old_image on wikidata en commonswiki

Change 405270 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Decommission old codfw masters

https://gerrit.wikimedia.org/r/405270

Change 405270 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Decommission old codfw masters

https://gerrit.wikimedia.org/r/405270

jcrespo updated the task description. (Show Details)Jan 19 2018, 11:31 AM
jcrespo updated the task description. (Show Details)Jan 19 2018, 11:35 AM

Change 405273 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Decommission old codfw masters

https://gerrit.wikimedia.org/r/405273

Change 405273 merged by Jcrespo:
[operations/puppet@production] mariadb: Decommission old codfw masters

https://gerrit.wikimedia.org/r/405273

Mentioned in SAL (#wikimedia-operations) [2018-01-19T17:11:19Z] <jynus> stopping mariadb on db2016,17,18,19,23,28&29 T184090

jcrespo updated the task description. (Show Details)
jcrespo updated the task description. (Show Details)Jan 19 2018, 5:40 PM
jcrespo reassigned this task from jcrespo to Papaul.Jan 19 2018, 5:42 PM
jcrespo moved this task from In progress to Blocked external/Not db team on the DBA board.
jcrespo added a project: ops-codfw.

Papaul, these 7 old hosts are ready to go, and we should make room for others.

Restricted Application added a project: Operations. · View Herald TranscriptJan 19 2018, 5:42 PM
Papaul reassigned this task from Papaul to jcrespo.Jan 19 2018, 6:07 PM
Papaul added a subscriber: Papaul.

@jcrespo thanks.

Can you please do the steps below and assign the task back to me. Thanks

Disable puppet on host
Remove all remaining puppet references (include role::spare:system)
Puppet node clean, puppet node deactivate

RobH claimed this task.Jan 19 2018, 6:10 PM
RobH added a subscriber: RobH.

Please note that we shoudl do those steps, not Jaime, since he cannot disable the switch port (which has to be done at the same time.)

I'll claim this and handle those steps.

Note I have not a problem to do those if told, but specially disabling puppet should be done just before literally shutting down the servers and disconnecting then to avoid not receiving security upgrades, etc., so I prefer if robh can coordinate that (no matter who actually does it in the end.

Change 405340 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029

https://gerrit.wikimedia.org/r/405340

Change 405342 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] removing dns entries for db20(1[6-9]|2[389]

https://gerrit.wikimedia.org/r/405342

RobH added a comment.Jan 19 2018, 6:37 PM

Switch ports for later removal (once they are unracked):

ge-6/0/0 - db2016
ge-6/0/1 - db2017
ge-6/0/2 - db2018
ge-6/0/3 - db2019
ge-6/0/7 - db2023
ge-6/0/11 - db2028
ge-6/0/12 - db2029

RobH updated the task description. (Show Details)Jan 19 2018, 6:39 PM

Change 405340 merged by RobH:
[operations/puppet@production] Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029

https://gerrit.wikimedia.org/r/405340

Change 405342 merged by RobH:
[operations/dns@master] removing dns entries for db20(1[6-9]|2[389]

https://gerrit.wikimedia.org/r/405342

RobH reassigned this task from RobH to Papaul.Jan 19 2018, 6:44 PM
RobH updated the task description. (Show Details)

Ok, these are all ready to have disks wiped, unracked, and racktables updated. Then feel free to assign back to me to complete the last two steps, thanks!

Papaul updated the task description. (Show Details)Jan 23 2018, 5:47 PM
Papaul updated the task description. (Show Details)Jan 30 2018, 5:58 PM

Change 407173 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] Decom: Remove mgmt DNS entries for db201[6-9],db2023 and db202[8-9]

https://gerrit.wikimedia.org/r/407173

Dzahn reassigned this task from Papaul to RobH.Feb 1 2018, 3:33 PM

Change 407173 merged by Dzahn:
[operations/dns@master] Decom: Remove mgmt DNS entries for db201[6-9],db2023 and db202[8-9]

https://gerrit.wikimedia.org/r/407173

RobH closed this task as Resolved.Feb 12 2018, 5:13 PM
RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)Feb 12 2018, 5:15 PM