Page MenuHomePhabricator

cloudcontrol2005-dev: make it a cloudlb backend
Closed, ResolvedPublic

Description

This task is to track the work to make cloudcontrol2005-dev a cloudlb backend.

Similar to T336236: cloudcontrol2001-dev: make it a cloudlb backend.

Event Timeline

aborrero changed the task status from Open to In Progress.May 12 2023, 9:19 AM
aborrero triaged this task as High priority.
aborrero created this task.

Change 923301 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudcontrol2005-dev: move to the new network setup

https://gerrit.wikimedia.org/r/923301

cookbooks.sre.hosts.decommission executed by aborrero@cumin2002 for hosts: cloudcontrol2005-dev.wikimedia.org

  • cloudcontrol2005-dev.wikimedia.org (WARN)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Management interface not found on Icinga, unable to downtime it
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
aborrero added a project: ops-codfw.
aborrero added a subscriber: Jhancock.wm.

Please @Jhancock.wm update the physical network connection of this server from asw-b1-codfw (WMF5942) ge-1/0/14 to cloudsw1-b1-codfw (WMF11695).

Please @Jhancock.wm update the physical network connection of this server from asw-b1-codfw (WMF5942) ge-1/0/14 to cloudsw1-b1-codfw (WMF11695).

@Jhancock.wm you can run the Netbox provision script for this one (vlan type 'cloud hosts') as it's gone through a decommissioning step. So basically it can be processed like a newly installed server. Any issues or questions feel free to ping me.

@aborrero when that's done we can reimage again. Ping me when you are doing so, I'll need to add the cloud-private vlan to the new switch port manually in Netbox (the above provision script for new hosts hasn't been updated to support that yet).

Change 923301 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudcontrol2005-dev: move to the new network setup

https://gerrit.wikimedia.org/r/923301

@cmooney I moved the patch to switch cloudsw1-b1-codfw, port ge-1/0/13, but I can't get the netbox script to work. the server name is not showing up in the device list.

@cmooney I moved the patch to switch cloudsw1-b1-codfw, port ge-1/0/13, but I can't get the netbox script to work. the server name is not showing up in the device list.

No problem @Jhancock.wm. I'd forgotten a few steps I should have mentioned. Moving a server is quite rare, and our automation isn't set up properly for it, which makes it tricky. We basically have to follow the steps under 'Update netbox' from here.

I've done that now and all seems ok. I also updated the switches so should be good. Thanks for your help.

@aborrero you should be good to do the reimage on this now. I've reserved 172.20.5.11/24 in Netbox for the vlan2151 interface. I think that should be ok when we do the puppet import later, if not I can have a look.

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin2002 for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye

@aborrero you should be good to do the reimage on this now. I've reserved 172.20.5.11/24 in Netbox for the vlan2151 interface. I think that should be ok when we do the puppet import later, if not I can have a look.

Thanks! Reimaging now!

Since 172.20.5.7/24 was already present, I've deleted the one you reserved.

Change 924504 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudcontrol2005-dev: enable puppet role

https://gerrit.wikimedia.org/r/924504

Change 924504 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudcontrol2005-dev: enable puppet role

https://gerrit.wikimedia.org/r/924504

Change 924526 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/dns@master] wikimediacloud.org: adjust openstack.codfw1dev FQDN

https://gerrit.wikimedia.org/r/924526

Change 924526 merged by Arturo Borrero Gonzalez:

[operations/dns@master] wikimediacloud.org: adjust openstack.codfw1dev FQDN

https://gerrit.wikimedia.org/r/924526

Change 924533 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudlb: make it aware of cloudcontrol2005-dev

https://gerrit.wikimedia.org/r/924533

Change 924533 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudlb: make it aware of cloudcontrol2005-dev

https://gerrit.wikimedia.org/r/924533

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin2002 for host cloudcontrol2005-dev.codfw.wmnet with OS bullseye completed:

  • cloudcontrol2005-dev (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305301303_aborrero_4112615_cloudcontrol2005-dev.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually