⚓ T304888 Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts

Subject	Repo	Branch	Lines +/-
Remove temporary ns2 def for cloudservices1005	operations/puppet	production	+0 -1
Replace cloudservices1003 with cloudservices1005 for ns0	operations/dns	master	+2 -7
Replace cloudservices1003 with cloudservices1005	operations/puppet	production	+6 -7
cloudservices1005 will replace ns0 rather than ns1.	operations/puppet	production	+1 -1
Add cloudservices1005 to the list of designate hosts	operations/puppet	production	+2 -0
cloudservices1005: hack in a temporary resolver fqdn	operations/puppet	production	+7 -0
Make Cloudservices1005 a designate node	operations/puppet	production	+4 -6
updating site.pp entry cloudnet1005-6	operations/puppet	production	+1 -1
adding new wmcs hosts to netboot.cfg	operations/puppet	production	+8 -7
Adding new cloudnet, cloudrabbit and cloudservice nodes to site.pp	operations/puppet	production	+15 -0

Status	Subtype	Assigned	Task
			Unknown Object (Task)
Resolved		• Cmjohnson	T304888 Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts
Resolved		wiki_willy	T309576 Degraded RAID on cloudnet1004
Resolved		Andrew	T314522 Dedicated cloudrabbit nodes in eqiad1
Open		None	T377934 openstack: mirror cloudrabbit setup from eqiad1 to codfw1dev
Resolved		aborrero	T316284 Replace cloudnet100[34] with cloudnet100[56]
Resolved		aborrero	T318824 Make cloudnet200[56}-dev single NIC
Resolved		aborrero	T319524 Cloud VPS: neutron network: the ifupdown bridge setup can be fragile
Resolved	Request	Jclark-ctr	T319683 decommission cloudnet1004.eqiad.wmnet
Resolved	Request	Jclark-ctr	T319682 decommission cloudnet1003.eqiad.wmnet
Resolved	Request	Jclark-ctr	T316285 decommission cloudservices1003.wikimedia..org

Change 811771 merged by Cmjohnson:

[operations/puppet@production] adding new wmcs hosts to netboot.cfg

https://gerrit.wikimedia.org/r/811771

• Cmjohnson updated the task description. (Show Details)Jul 6 2022, 8:17 PM

• Cmjohnson updated the task description. (Show Details)

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudrabbit1001.wikimedia.org with OS bullseye

Maintenance_bot removed a project: Patch-For-Review.Jul 6 2022, 8:31 PM

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudrabbit1001.wikimedia.org with OS bullseye executed with errors:

cloudrabbit1001 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudrabbit1001.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudrabbit1002.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1005.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudservices1005.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudrabbit1001.wikimedia.org with OS bullseye executed with errors:

cloudrabbit1001 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details

I am getting this on all but the cloudnets, those are not hitting the installer.

────────────────────┤ [!!] Configure the network ├─────────────────────┐

│                                                                       │
│                   Network autoconfiguration failed                    │
│ Your network is probably not using the DHCP protocol. Alternatively,  │
│ the DHCP server may be slow or some network hardware is not working   │
│ properly.                                                             │
│

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudrabbit1002.wikimedia.org with OS bullseye executed with errors:

cloudrabbit1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bullseye executed with errors:

cloudrabbit1003 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1005.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1005 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1006 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudservices1005.wikimedia.org with OS bullseye executed with errors:

cloudservices1005 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details

@Cmjohnson Hey. Drop me a line on this one perhaps.

The issue is that the cloudnet assigned IPs do not seem to match the Vlans they have been assigned to. This has alerted in the Netbox report:

The fix should be relatively straightforward. For instance for cloudnet1005 has been correctly assigned IPs matching the ' cloud-hosts1-c8-eqiad (1128)' Vlan, so changing the Vlan on cloudsw1-c8-eqiad xe-0/0/3 from 'cloud-hosts1-eqiad (1118)' to 1128 should fix it.

But what I'm more concerned with is how this discrepancy happened. We'd reworked the Netbox provisioning script so it should pick the rack-specific Vlan for new hosts. It's picked the IPs from there, but I can't understand why the switch has then be assigned to the old Vlan. So rather than just jumping in and fixing manually I want to try and work out why the script didn't work as intended and fix that. Thanks!

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

@Cmjohnson just an update here. I left cloudnet1005 alone, so we can piece back why the switch ports ended up on the wrong vlans (I unfortunately couldn't find related logs in Netbox to see what the provision script there did).

I manually changed port xe-0/0/11 on cloudsw1-d5-eqiad to Vlan 1127 / 'cloud-hosts1-d5-eqiad' and re-tried the image to see if there were any other niggles (these are the first to be reimaged in this rack since the cloud network re-design).

What I found was that DHCP worked fine when initiated by the iDRAC/PXE-boot. The system got an IP address from the install server and the Debian installer started running.

However, when the debian installer went to do it's DHCP request, which should work the same, it failed. Looking on both the switch and the iDRAC GUI I can see that both server NIC ports remain hard down at this point. So obviously the DHCP request fails as the connection to the switch has gone down.

I'm at a loss to explain why the port was working, and then goes down during the debian-installer phase. Potentially could it be related to frimware for the NIC or something? What I can say is the DHCP config on the switch appears to be valid and working as expected.

@cmooney the cloudnet servers were manually moved in netbox, so I don't know if the script would've picked up the vlan change. I find it interesting that you fixed cloudnet vlan issue and the server is experiencing the same issue as the others.

@Cmjohnson ok. Is it possible that when you moved them you selected the wrong Vlan?

If the script is assigning IPs from one Vlan, but configuring the switches for a different one, that's a big problem we need to sort out in the script. On the other hand if the inconsistency was just a manual error during the move then it's no issue.

In terms of the fact debian-installer stage is failing DHCP I'm not sure. Both NICs remain hard down throughout. I think first step should probably be to look at the NIC firmware version and get it on the known best, which I think based on T304483#8032810 is 21.85.21.92, but you guys probably know better than me. Currently it's on 21.40.25.31.

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1006 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details

@cmooney it is most likely a manual change error. I did not completely delete the interface after removing the cloudcephosd hosts, I only updated it with the new vlan for the cloudnets. In the future, I will delete the interface entirely and start over.

As for the nic firmware, I am updating everything but 1006, I think you may have already done that. I do not think that will fix the issue.

@cmooney I believe I found the error, in site.pp I failed to put a ^ before the hostname

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudrabbit1001.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudrabbit1002.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudservices1005.wikimedia.org with OS bullseye

Change 812033 had a related patch set uploaded (by Cmjohnson; author: Cmjohnson):

[operations/puppet@production] updating site.pp entry cloudnet1005-6

https://gerrit.wikimedia.org/r/812033

gerritbot added a project: Patch-For-Review.Jul 7 2022, 4:35 PM

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudrabbit1001.wikimedia.org with OS bullseye completed:

cloudrabbit1001 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202207071559_cmjohnson_1613294_cloudrabbit1001.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> staged

Change 812033 merged by Cmjohnson:

[operations/puppet@production] updating site.pp entry cloudnet1005-6

https://gerrit.wikimedia.org/r/812033

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudrabbit1002.wikimedia.org with OS bullseye completed:

cloudrabbit1002 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202207071613_cmjohnson_1614917_cloudrabbit1002.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudrabbit1003.wikimedia.org with OS bullseye completed:

cloudrabbit1003 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202207071614_cmjohnson_1615069_cloudrabbit1003.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1006 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudservices1005.wikimedia.org with OS bullseye completed:

cloudservices1005 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202207071625_cmjohnson_1618440_cloudservices1005.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> staged

all but the cloudnets installed correctly, they're still presenting the dhcp error. I am thinking I may just blow out all the network configuration and delete the ports and start over. @cmooney

Maintenance_bot removed a project: Patch-For-Review.Jul 7 2022, 5:30 PM

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1005.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1005.eqiad.wmnet with OS bullseye completed:

cloudnet1005 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202207071831_cmjohnson_1648841_cloudnet1005.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1006 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1006 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1006 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details

all but cloudnet1006 has gone through the installer, cloudnet1006 is still giving the dhcp error. I did try deleting all the ports and starting over but that did not seem to work.

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1006 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details

In T304888#8062648, @Cmjohnson wrote:

all but the cloudnets installed correctly, they're still presenting the dhcp error. I am thinking I may just blow out all the network configuration and delete the ports and start over. @cmooney

I wouldn't be super confident that will help. When I was checking last week all the network elements were set up right, and the fact they make it to the debain basically confirmed that. So definitely something on the NIC/firmware/driver side I suspect.

Just for the record cloudnet1005 did seem to install ok. Or at least DHCP did not fail at PXE or debian-installer stage.

It's using NIC firmware 21.85.21.92 though. Cloudnet1006 is the same exact hardware as I understand, but is still failing. It's still on firmware 21.40.25.31 so I reckon the upgrade is likely to work.

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye executed with errors:

cloudnet1006 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details

@cmooney cloudnet1006 nic f/w was update but still fails, if you get a moment can you take a look. I am not sure what I am missing

There is an OS on the server but has not gone through puppet and unable to ssh

@Papaul or @RobH I don't know what I am doing wrong with cloudnet1006, the installer fails fairly early in the process. There is a current OS on it that was not finalized with puppet. If you get a spare moment can you take a look

@Cmjohnson if there is a current OS on it and was not finalized with puppet, try to re-run the cookbook with the --no-pxe --new flags.

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudnet1006.eqiad.wmnet with OS bullseye completed:

cloudnet1006 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202207221256_cmjohnson_2871348_cloudnet1006.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> staged

Thanks @Papaul that worked. @Andrew all yours!

Andrew closed subtask T314522: Dedicated cloudrabbit nodes in eqiad1 as Resolved.Aug 12 2022, 1:48 PM

Change 826352 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Make Cloudservices1005 a designate node

https://gerrit.wikimedia.org/r/826352

gerritbot added a project: Patch-For-Review.Aug 24 2022, 5:03 PM

Change 826352 merged by Andrew Bogott:

[operations/puppet@production] Make Cloudservices1005 a designate node

https://gerrit.wikimedia.org/r/826352

Maintenance_bot removed a project: Patch-For-Review.Aug 24 2022, 5:31 PM

Change 826358 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudservices1005: hack in a temporary resolver fqdn

https://gerrit.wikimedia.org/r/826358

Change 826358 merged by Andrew Bogott:

[operations/puppet@production] cloudservices1005: hack in a temporary resolver fqdn

https://gerrit.wikimedia.org/r/826358

Maintenance_bot removed a project: Patch-For-Review.Aug 24 2022, 6:32 PM

Change 826364 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add cloudservices1005 to the list of designate hosts

https://gerrit.wikimedia.org/r/826364

Change 826364 merged by Andrew Bogott:

[operations/puppet@production] Add cloudservices1005 to the list of designate hosts

https://gerrit.wikimedia.org/r/826364

Maintenance_bot removed a project: Patch-For-Review.Aug 24 2022, 7:30 PM

Change 826378 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudservices1005 will replace ns0 rather than ns1.

https://gerrit.wikimedia.org/r/826378

Change 826378 merged by Andrew Bogott:

[operations/puppet@production] cloudservices1005 will replace ns0 rather than ns1.

https://gerrit.wikimedia.org/r/826378

Change 826387 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Replace cloudservices1003 with cloudservices1005

https://gerrit.wikimedia.org/r/826387

Change 826388 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/dns@master] Replace cloudservices1003 with cloudservices1005 for ns0

https://gerrit.wikimedia.org/r/826388

Change 826387 merged by Andrew Bogott:

[operations/puppet@production] Replace cloudservices1003 with cloudservices1005

https://gerrit.wikimedia.org/r/826387

RobH unsubscribed.Aug 24 2022, 9:33 PM

Change 826388 merged by Andrew Bogott:

[operations/dns@master] Replace cloudservices1003 with cloudservices1005 for ns0

https://gerrit.wikimedia.org/r/826388

Change 826393 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Remove temporary ns2 def for cloudservices1005

https://gerrit.wikimedia.org/r/826393

Change 826393 merged by Andrew Bogott:

[operations/puppet@production] Remove temporary ns2 def for cloudservices1005

https://gerrit.wikimedia.org/r/826393

Mentioned in SAL (#wikimedia-cloud) [2022-08-24T22:07:10Z] <andrewbogott> replaced cloudservices1003 with cloudservices1005 T304888

Andrew added a subtask: T316285: decommission cloudservices1003.wikimedia..org.Aug 25 2022, 7:24 PM

aborrero closed subtask T316284: Replace cloudnet100[34] with cloudnet100[56] as Resolved.Oct 7 2022, 9:00 AM

Jclark-ctr closed subtask T316285: decommission cloudservices1003.wikimedia..org as Resolved.Oct 14 2022, 2:00 PM

Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts
Closed, ResolvedPublic
Actions

Description

Hostname / Racking / Installation Details

Per host setup checklist

cloudrabbit1001:

cloudrabbit1002:

cloudrabbit1003:

cloudnet1005:

cloudnet1006:

cloudservices1005:

Details

Related Objects
Search...

Event Timeline

	RobH
	Mar 28 2022, 7:20 PM

	F35310513: image.png
	Jul 7 2022, 9:13 AM

Q4: (Need By: TBD) rack/setup/install 6 wmcs hostsClosed, ResolvedPublicActions

Description

Hostname / Racking / Installation Details

Per host setup checklist

cloudrabbit1001:

cloudrabbit1002:

cloudrabbit1003:

cloudnet1005:

cloudnet1006:

cloudservices1005:

Details

Related ObjectsSearch...

Event Timeline

Q4: (Need By: TBD) rack/setup/install 6 wmcs hosts
Closed, ResolvedPublic
Actions

Related Objects
Search...