Page MenuHomePhabricator

rename cloudgw2001-dev into cloudlb2001-dev
Closed, ResolvedPublic

Description

The cloudgw2001-dev server (https://netbox.wikimedia.org/dcim/devices/1774/) is going part of a PoC for the cloudlb project (see T324992: cloudlb: create PoC on codfw).

To avoid naming confusion, it would be good to rename the server to cloudlb2001-dev.

Procedure: https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Rename_while_reimaging

Event Timeline

aborrero triaged this task as Medium priority.Jan 25 2023, 2:02 PM
aborrero created this task.
aborrero added a subscriber: Papaul.

cookbooks.sre.hosts.decommission executed by aborrero@cumin2002 for hosts: cloudgw2001-dev.codfw.wmnet

  • cloudgw2001-dev.codfw.wmnet (WARN)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Management interface not found on Icinga, unable to downtime it
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

Change 884027 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudgw2001-dev: rename server to cloudlb2001-dev

https://gerrit.wikimedia.org/r/884027

Change 884027 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudgw2001-dev: rename server to cloudlb2001-dev

https://gerrit.wikimedia.org/r/884027

I can't run the reimage script because the server lacks primary IPv4:

aborrero@cumin2002:~$ sudo cookbook sre.hosts.reimage --os bullseye --new -t T327908 cloudlb2001-dev
==> ATTENTION: destructive action for host: cloudlb2001-dev
Are you sure to proceed?
Type "go" to proceed or "abort" to interrupt the execution
> go
Exception raised while initializing the Cookbook sre.hosts.reimage:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 219, in run
    runner = self.instance.get_runner(args)
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 88, in get_runner
    return ReimageRunner(args, self.spicerack)
  File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 107, in __init__
    self.fqdn = self.netbox_server.fqdn
  File "/usr/lib/python3/dist-packages/spicerack/netbox.py", line 349, in fqdn
    raise NetboxError(f"Server {self._server.name} does not have any primary IP with a DNS name set.")
spicerack.netbox.NetboxError: Server cloudlb2001-dev does not have any primary IP with a DNS name set.

Trying to fix that by using the https://netbox.wikimedia.org/extras/scripts/interface_automation.ProvisionServerNetwork/ script. For that, I deleted all interface information, otherwise the script would fail.

The ProvisionServerNetwork script changed the mgmt IP:

-cloudlb2001-dev                          1H IN A 10.193.1.33
+cloudlb2001-dev                          1H IN A 10.193.0.243

So changing that by hand to keep the same IP address.

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudlb2001-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudlb2001-dev.codfw.wmnet with OS bullseye executed with errors:

  • cloudlb2001-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudlb2001-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudlb2001-dev.codfw.wmnet with OS bullseye executed with errors:

  • cloudlb2001-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudlb2001-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudlb2001-dev.codfw.wmnet with OS bullseye completed:

  • cloudlb2001-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202301271125_aborrero_321797_cloudlb2001-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active