Page MenuHomePhabricator

Reclone db1169 (s1)
Closed, ResolvedPublic

Description

After fixing T410400 we can now reclone db1169 and get it back to production

Event Timeline

Marostegui renamed this task from Reclone db1169 to Reclone db1169 (s1).Tue, Dec 2, 1:55 PM
Marostegui triaged this task as Medium priority.

Started cloning db1251.eqiad.wmnet to db1169.eqiad.wmnet - marostegui@cumin1003

Completed depool of db1251 - Depool db1251.eqiad.wmnet to then clone it to db1169.eqiad.wmnet - marostegui@cumin1003 - marostegui@cumin1003

Start pool of db1251 gradually with 4 steps - Pool db1251.eqiad.wmnet in after cloning - marostegui@cumin1003

Completed pool of db1251 gradually with 4 steps - Pool db1251.eqiad.wmnet in after cloning - marostegui@cumin1003

Finished cloning db1251.eqiad.wmnet to db1169.eqiad.wmnet - marostegui@cumin1003

Change #1214220 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] installserver: Add db1169 to preseed

https://gerrit.wikimedia.org/r/1214220

Change #1214220 merged by Marostegui:

[operations/puppet@production] installserver: Add db1169 to preseed

https://gerrit.wikimedia.org/r/1214220

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db1169.eqiad.wmnet with OS trixie

I am reimagining this host with Trixie, which was the original point of the task, which ended up with all the nokia/uefi things

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1169.eqiad.wmnet with OS trixie completed:

  • db1169 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512030605_marostegui_501408_db1169.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Completed depool of db1169 - Depooling db1169 - marostegui@cumin1003

Change #1214243 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1169: Enable notifications

https://gerrit.wikimedia.org/r/1214243

Change #1214243 merged by Marostegui:

[operations/puppet@production] db1169: Enable notifications

https://gerrit.wikimedia.org/r/1214243

Start pool of db1169 gradually with 4 steps - Repooling db1169 - marostegui@cumin1003

Completed pool of db1169 gradually with 4 steps - Repooling db1169 - marostegui@cumin1003