Page MenuHomePhabricator

Re-image (rename) dbstore1006 into db1125
Closed, ResolvedPublic

Description

As part of T283125, db1125 was converted into dbstore1006 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/692984) but due to a mistake, it wasn't detected that the disk space wasn't enough, so dbstore1006 needs to be converted back to db1125.

dbstore1006 is not and won't be in use so its data isn't needed so this can be done anytime.

The process to rename a host is: https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Rename_while_reimaging

db1125 needs to go into test-s4 section as replica of db1124.

Event Timeline

Marostegui moved this task from Triage to Ready on the DBA board.

cookbooks.sre.hosts.decommission executed by kormat@cumin1001 for hosts: dbstore1006.eqiad.wmnet

  • dbstore1006.eqiad.wmnet (PASS)
    • Downtimed host on Icinga
    • Found physical host
    • Downtimed management interface on Icinga
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Dropped dbstore1006 from tendril and zarcillo

Change 697736 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] db1125: Rename back from dbstore1006

https://gerrit.wikimedia.org/r/697736

Change 697736 merged by Kormat:

[operations/puppet@production] db1125: Rename back from dbstore1006

https://gerrit.wikimedia.org/r/697736

Script wmf-auto-reimage was launched by kormat on cumin1001.eqiad.wmnet for hosts:

['db1125.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106021218_kormat_13166.log.

Change 697779 had a related patch set uploaded (by Kormat; author: Kormat):

[operations/puppet@production] install_server: Temporarily set db1125 to destructive install.

https://gerrit.wikimedia.org/r/697779

Change 697779 merged by Kormat:

[operations/puppet@production] install_server: Temporarily set db1125 to destructive install.

https://gerrit.wikimedia.org/r/697779

Script wmf-auto-reimage was launched by kormat on cumin1001.eqiad.wmnet for hosts:

['db1125.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106021242_kormat_2843.log.

Completed auto-reimage of hosts:

['db1125.eqiad.wmnet']

Of which those FAILED:

['db1125.eqiad.wmnet']

Script wmf-auto-reimage was launched by kormat on cumin1001.eqiad.wmnet for hosts:

['db1125.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202106021303_kormat_19444.log.

Completed auto-reimage of hosts:

['db1125.eqiad.wmnet']

and were ALL successful.

Current status:

  • db1125 has been renamed, wiped, and reimaged
  • It still needs to be re-added to tendril/zarcillo, and have an s4 snapshot deployed on it.

@Kormat no need to add s4 data to it, just make it a replica of db1124 :)

"just" done :)

It's back in tendril+zarcillo, and is a replica of db1124.

Thank you!
PS: Orchestrator detected it automatically too! <3

Re-labelling not necessary, as it wasn't re-labelled away from db1125 in the first place: T283300

Machine state set to 'active' in netbox.