Page MenuHomePhabricator

rename cloudswift1002 as cloudlb1002
Closed, ResolvedPublic

Description

Netbox device: https://netbox.wikimedia.org/dcim/devices/3525

Procedure: https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Rename_while_reimaging

  • decomission
  • netbox: edit the device name, and set its status from DECOMMISSIONING to PLANNED.
  • readd the DNS Name field for the management interface
  • run sre.dns.netbox cookbook
  • run sre.network.configure-switch-interfaces cookbook
  • reimage server with new name

Event Timeline

aborrero changed the task status from Open to In Progress.
aborrero triaged this task as Medium priority.
aborrero moved this task from Backlog to Doing on the User-aborrero board.

cookbooks.sre.hosts.decommission executed by aborrero@cumin1001 for hosts: cloudswift1002.eqiad.wmnet

  • cloudswift1002.eqiad.wmnet (WARN)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Management interface not found on Icinga, unable to downtime it
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 936019 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudlb1001/1002: add role

https://gerrit.wikimedia.org/r/936019

Change 936019 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudlb1001/1002: add role

https://gerrit.wikimedia.org/r/936019

Change 936022 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudlb: eqiad: bootstrap hiera data

https://gerrit.wikimedia.org/r/936022

aborrero changed the task status from Stalled to In Progress.Jul 7 2023, 3:59 PM

No longer blocked!

Change 936022 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudlb: eqiad: bootstrap hiera data

https://gerrit.wikimedia.org/r/936022

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1001 for host cloudlb1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudlb1002.eqiad.wmnet with OS bullseye executed with errors:

  • cloudlb1002 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1001 for host cloudlb1002.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1001 for host cloudlb1002.eqiad.wmnet with OS bullseye completed:

  • cloudlb1002 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202307101028_aborrero_4018022_cloudlb1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
aborrero updated the task description. (Show Details)