Page MenuHomePhabricator

decommission db1036
Closed, ResolvedPublic

Description

It is getting low on disk space.

  • - all system services confirmed offline from production use: Removed from mediawiki-config: https://gerrit.wikimedia.org/r/#/c/381233/
  • - set all icinga checks to maint mode/disabled while reclaim/decommmission takes place.
  • - remove system from all lvs/pybal active configuration
  • - any service group puppet/heira/dsh config removed
  • - remove site.pp (replace with role::spare if system isn't shut down immediately during this process.):
  • Set to spare: https://gerrit.wikimedia.org/r/#/c/381721/

START NON-INTERRUPPTABLE STEPS

  • - disable puppet on host
  • - remove all remaining puppet references (include role::spare)
  • - power down host
  • - disable switch port & change switch port label to asset tag
  • - remove production dns entries & remove hostname entries in mgmt dns
  • - puppet node clean, puppet node deactivate, salt key removed

END NON-INTERRUPPTABLE STEPS

  • - system disks wiped (by onsite)
  • - remove hostname label, remove hostname from visible label field in racktables (by onsite)
  • - system added back to decom rack (by onsite)

Event Timeline

jcrespo created this task.Sep 20 2017, 11:26 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 20 2017, 11:26 AM

Change 379212 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Depool db1101 for maintenance

https://gerrit.wikimedia.org/r/379212

Change 379212 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Depool db1101 for maintenance

https://gerrit.wikimedia.org/r/379212

Mentioned in SAL (#wikimedia-operations) [2017-09-20T11:49:33Z] <jynus> stopping replication on db1101 for faster repartition work T176311

Repartitioning db1101 is ongoing (while replication is down) so that it can substitute db1036 role.

partitioning finished, db1101 should be ready to be pooled as the new special slave.

Change 379756 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] Pool db1101 as recentchanges replica for s2 with low weight

https://gerrit.wikimedia.org/r/379756

Change 379756 merged by jenkins-bot:
[operations/mediawiki-config@master] Pool db1101 as recentchanges replica for s2 with low weight

https://gerrit.wikimedia.org/r/379756

Marostegui moved this task from Triage to Next on the DBA board.Sep 25 2017, 4:35 AM

Change 380440 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase weight for db1101

https://gerrit.wikimedia.org/r/380440

Marostegui updated the task description. (Show Details)Sep 25 2017, 7:49 AM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)

Change 380440 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Increase weight for db1101

https://gerrit.wikimedia.org/r/380440

Mentioned in SAL (#wikimedia-operations) [2017-09-25T07:56:04Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Increase traffic for db1101 - T176311 (duration: 00m 46s)

Change 380444 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Increase weight for db1101

https://gerrit.wikimedia.org/r/380444

Change 380448 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1036

https://gerrit.wikimedia.org/r/380448

Change 380448 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1036, fully repool db1101

https://gerrit.wikimedia.org/r/380448

Mentioned in SAL (#wikimedia-operations) [2017-09-25T09:05:31Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1036 and full repool db1101 - T176311 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2017-09-25T09:07:56Z] <marostegui> Add 50GB to db1036 /srv partition - T176311

Marostegui moved this task from Next to In progress on the DBA board.

db1036 is now fully depooled and db1101 is serving as a special slave.
Let's give it a couple of days and then proceed to get rid of db1036 from all the config files.

Meanwhile I have added 50GB to /srv to avoid it showing up on icinga (it was 91% filled and now 88%)

Change 380444 abandoned by Marostegui:
db-eqiad.php: Increase weight for db1101

Reason:
Done here: https://gerrit.wikimedia.org/r/#/c/380448/

https://gerrit.wikimedia.org/r/380444

Change 381233 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1036

https://gerrit.wikimedia.org/r/381233

Change 381233 merged by Marostegui:
[operations/mediawiki-config@master] db-eqiad,db-codfw.php: Remove db1036

https://gerrit.wikimedia.org/r/381233

Mentioned in SAL (#wikimedia-operations) [2017-10-02T06:07:07Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Remove db1036 from config files as it will be decommissioned - T176311 (duration: 00m 48s)

Marostegui updated the task description. (Show Details)Oct 2 2017, 6:07 AM

Mentioned in SAL (#wikimedia-operations) [2017-10-02T06:08:10Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Remove db1036 from config files as it will be decommissioned - T176311 (duration: 00m 46s)

Change 381721 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Remove db1036 for decommissioning

https://gerrit.wikimedia.org/r/381721

Change 381721 merged by Marostegui:
[operations/puppet@production] mariadb: Remove db1036 for decommissioning

https://gerrit.wikimedia.org/r/381721

Marostegui reassigned this task from Marostegui to Cmjohnson.Oct 2 2017, 6:38 AM
Marostegui updated the task description. (Show Details)
Marostegui moved this task from In progress to Done on the DBA board.
Marostegui added a project: ops-eqiad.
Marostegui added a subscriber: Cmjohnson.

db1036 is now ready to be totally decommissioned by @Cmjohnson

Change 381736 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/software@master] s2.hosts: Remove db1036

https://gerrit.wikimedia.org/r/381736

Mentioned in SAL (#wikimedia-operations) [2017-10-02T07:41:53Z] <marostegui> Stop MySQL on db1036 as it is going to be decommissioned - https://phabricator.wikimedia.org/T176311

Change 381736 merged by jenkins-bot:
[operations/software@master] s2.hosts: Remove db1036

https://gerrit.wikimedia.org/r/381736

Cmjohnson updated the task description. (Show Details)Nov 8 2017, 6:50 PM
Cmjohnson updated the task description. (Show Details)Nov 8 2017, 8:36 PM
Cmjohnson closed this task as Resolved.Nov 14 2017, 4:24 PM

wiped, racktables updated