Page MenuHomePhabricator

Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of <enter the FQDN/hostname of the hosts being setup here>

Hostname / Racking / Installation Details

cloudcephmon200[2,3]-dev replacements

Hostnames: cloudcephmon200[5,6]-dev
Racking Proposal: Cannot share same rack as any other cloudcephmont200*-dev server Place servers into different racks. WMCS racks.
Networking Setup: 1 connect, 10g. cloud-hosts1-codfw VLAN
Partitioning/Raid: sw raid 10 (all four drives) partman/standard.cfg partman/raid10-4dev.cfg
OS Distro: Bullseye

cloudnet200[2,3]-dev replacements

Hostnames: cloudnet200[5,6]-dev
Racking Proposal: Cannot share same rack as any other cloudnet200*-dev server. Place servers into different racks. WMCS racks.
Networking Setup: First 10g port to cloud-hosts1-*-codfw, second 10g port TRUNK with 2 vlans, cloud-gw-transport and cloud-instance-transport
Partitioning/Raid: sw raid 10 (all four drives)
OS Distro: Bullseye

cloudservices200[2,3]-dev replacements

Hostname: cloudservices200[4,5]-dev
Racking Proposal: Can replace in existing racks. Or rack into separate racks. WMCS racks.
Networking Setup: 10g, public1 VLAN
Partitioning/Raid: sw raid 10 (all four drives) partman/standard.cfg partman/raid10-4dev.cfg
OS Distro: Bullseye

cloudweb2001-dev replacement

Hostnames: cloudweb2002-dev
Racking Proposal: Can replace in existing racks. WMCS racks.
Networking Setup: 10g, public vlan
OS Distro: Buster
Partitioning/Raid: sw raid 10 (all four drives) partman/standard.cfg partman/raid10-4dev.cfg

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

cloudcephmon2005-dev:
  • - receive in system on procurement task T303412 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
cloudcephmon2006-dev:
  • - receive in system on procurement task T303412 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
cloudnet2005-dev:
  • - receive in system on procurement task T303412 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
cloudnet2006-dev:
  • - receive in system on procurement task T303412 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
cloudservices2004-dev:
  • - receive in system on procurement task T303412 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
cloudservices2005-dev:
  • - receive in system on procurement task T303412 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
cloudweb2002-dev:
  • - receive in system on procurement task T303412 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
RobH renamed this task from (Need By: TBD) rack/setup/install <insert FQDN/hostname of hardware here> to Q3:(Need By: TBD) rack/setup/install 7 wmcs hosts.Mar 28 2022, 6:14 PM
RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH moved this task from Backlog to Racking / Decom on the cloud-services-team (Hardware) board.
RobH mentioned this in Unknown Object (Task).

@nskaggs @Andrew @aborrero @dcaro the goal for codfw is to consolidate all cloudx-dev nodes in a single rack see (T305469) and the racking proposal states: " Cannot share same rack as any other..." with this racking proposal, we will not be able to hit the goal. We are planning on putting a new cloud switch in codfw soon.
(@ayounsi ). Please note that very is no dedicated 10G switch in codfw for now like in Eqiad.

@Papaul By default for HA purposes, we include language to spread servers out when needed. However, given these machines are in dev, and not production, you can safely ignore that request to share racks. Especially if it makes it easier / more convenient for you and team to manage. Thanks for asking. I hope relaxing this requirement helps!

Change 784331 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add new cloud nodes to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/784331

Change 784331 merged by Papaul:

[operations/puppet@production] Add new cloud nodes to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/784331

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye completed:

  • cloudcephmon2005-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204192214_pt1979_3391900_cloudcephmon2005-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudcephmon2006-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudnet2005-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudcephmon2006-dev.codfw.wmnet with OS bullseye completed:

  • cloudcephmon2006-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204192253_pt1979_3397670_cloudcephmon2006-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudservices2004-dev.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudnet2005-dev.codfw.wmnet with OS bullseye completed:

  • cloudnet2005-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204192256_pt1979_3398002_cloudnet2005-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudnet2006-dev.codfw.wmnet with OS bullseye

second interface added for cloudnet2005

[edit interfaces]
+   ge-1/0/24 {
+       description cloudnet2005-dev;
+       unit 0 {
+           family ethernet-switching {
+               interface-mode trunk;
+               vlan {
+                   members [ cloud-gw-transport-codfw cloud-instance-transport1-b-codfw ];
+               }
+           }
+       }
+   }

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudservices2004-dev.wikimedia.org with OS bullseye completed:

  • cloudservices2004-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204192330_pt1979_3404718_cloudservices2004-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudservices2005-dev.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudnet2006-dev.codfw.wmnet with OS bullseye completed:

  • cloudnet2006-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204192334_pt1979_3404879_cloudnet2006-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudweb2002-dev.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudweb2002-dev.wikimedia.org with OS bullseye executed with errors:

  • cloudweb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudweb2002-dev.wikimedia.org with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudweb2002-dev.wikimedia.org with OS buster executed with errors:

  • cloudweb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details

second interface added for clounet2006

[edit interfaces]
+   ge-1/0/26 {
+       description cloudnet2006-dev;
+       unit 0 {
+           family ethernet-switching {
+               interface-mode trunk;
+               vlan {
+                   members [ cloud-gw-transport-codfw cloud-instance-transport1-b-codfw ];
+               }
+           }
+       }
+   }

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudservices2005-dev.wikimedia.org with OS bullseye completed:

  • cloudservices2005-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204200006_pt1979_3411233_cloudservices2005-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudweb2002-dev.wikimedia.org with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudweb2002-dev.wikimedia.org with OS buster executed with errors:

  • cloudweb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudweb2002-dev.wikimedia.org with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudweb2002-dev.wikimedia.org with OS buster completed:

  • cloudweb2002-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204200046_pt1979_3416974_cloudweb2002-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Papaul updated the task description. (Show Details)

@Andrew and Cloud team this is ready for service . Thanks

Change 784736 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Make cloudcephmon200[5,6] into cloudcephmon nodes.

https://gerrit.wikimedia.org/r/784736

Change 784737 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Make new hosts cloudservices200[4,5] into cloudservices nodes

https://gerrit.wikimedia.org/r/784737

Change 784738 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Make new cloudweb2002-dev node into a cloudweb node

https://gerrit.wikimedia.org/r/784738

Change 784736 merged by Andrew Bogott:

[operations/puppet@production] Make cloudcephmon200[5,6] into cloudcephmon nodes.

https://gerrit.wikimedia.org/r/784736

Change 784739 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add hiera settings for cloudcephmon200[5,6]

https://gerrit.wikimedia.org/r/784739

Change 784739 merged by Andrew Bogott:

[operations/puppet@production] Add hiera settings for cloudcephmon200[5,6]

https://gerrit.wikimedia.org/r/784739

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudcephmon2005-dev.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1001 for host cloudcephmon2006-dev.codfw.wmnet with OS buster

Change 784737 merged by Andrew Bogott:

[operations/puppet@production] Make new hosts cloudservices200[4,5] into cloudservices nodes

https://gerrit.wikimedia.org/r/784737

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudcephmon2006-dev.codfw.wmnet with OS buster completed:

  • cloudcephmon2006-dev (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204202046_andrew_1069668_cloudcephmon2006-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1001 for host cloudcephmon2005-dev.codfw.wmnet with OS buster completed:

  • cloudcephmon2005-dev (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204202046_andrew_1069611_cloudcephmon2005-dev.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 784800 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/dns@master] Make cloudservices2005-dev the new ns1.openstack.codfw1dev.wikimediacloud.org

https://gerrit.wikimedia.org/r/784800

Change 784800 merged by Andrew Bogott:

[operations/dns@master] Make cloudservices2005-dev the new ns1.openstack.codfw1dev.wikimediacloud.org

https://gerrit.wikimedia.org/r/784800

Change 784738 merged by Andrew Bogott:

[operations/puppet@production] Make new cloudweb2002-dev node into a cloudweb node

https://gerrit.wikimedia.org/r/784738

Change 784807 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] misc hiera changes to add cloudweb2002-dev

https://gerrit.wikimedia.org/r/784807

Change 784807 merged by Andrew Bogott:

[operations/puppet@production] misc hiera changes to add cloudweb2002-dev

https://gerrit.wikimedia.org/r/784807

Change 784810 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Prepare cloudservices200[2,3]-dev for decom

https://gerrit.wikimedia.org/r/784810

Change 784810 merged by Andrew Bogott:

[operations/puppet@production] Prepare cloudservices200[2,3]-dev for decom

https://gerrit.wikimedia.org/r/784810

Change 785391 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Prepare cloudweb2002-dev to replace cloudweb2001-dev

https://gerrit.wikimedia.org/r/785391

Change 785391 merged by Andrew Bogott:

[operations/puppet@production] Prepare cloudweb2002-dev to replace cloudweb2001-dev

https://gerrit.wikimedia.org/r/785391

Change 785399 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/mediawiki-config@master] labtestwiki: update test lab server

https://gerrit.wikimedia.org/r/785399

Change 785886 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Prepare cloudweb2001-dev for decom

https://gerrit.wikimedia.org/r/785886

Change 785887 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Prepare cloudcephmon200[2-3]-dev for decom

https://gerrit.wikimedia.org/r/785887

Change 785886 merged by Andrew Bogott:

[operations/puppet@production] Prepare cloudweb2001-dev for decom

https://gerrit.wikimedia.org/r/785886

Change 785887 merged by Andrew Bogott:

[operations/puppet@production] Prepare cloudcephmon200[2-3]-dev for decom

https://gerrit.wikimedia.org/r/785887

Change 785893 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Remove references to cloudcephosd200[2,3]-dev

https://gerrit.wikimedia.org/r/785893

Change 785893 merged by Andrew Bogott:

[operations/puppet@production] Remove references to cloudcephosd200[2,3]-dev

https://gerrit.wikimedia.org/r/785893

Change 785897 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Remove more references to cloudcephosd200[2,3]-dev

https://gerrit.wikimedia.org/r/785897

Change 785897 merged by Andrew Bogott:

[operations/puppet@production] Remove more references to cloudcephosd200[2,3]-dev

https://gerrit.wikimedia.org/r/785897

Change 785399 merged by jenkins-bot:

[operations/mediawiki-config@master] labtestwiki: update labtest ldap server

https://gerrit.wikimedia.org/r/785399

Mentioned in SAL (#wikimedia-operations) [2022-04-25T20:27:44Z] <catrope@deploy1002> Synchronized wmf-config/wikitech.php: Config: [[gerrit:785399|labtestwiki: update labtest ldap server (T304881)]] (duration: 01m 39s)

Change 785923 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Make cloudnet200[45].codfw.wmnet into openstack network nodes

https://gerrit.wikimedia.org/r/785923

Change 785923 merged by Andrew Bogott:

[operations/puppet@production] Make cloudnet200[45].codfw.wmnet into openstack network nodes

https://gerrit.wikimedia.org/r/785923

Cookbook cookbooks.sre.hosts.reimage was started by dcaro@cumin1001 for host cloudnet2006-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by dcaro@cumin1001 for host cloudnet2006-dev.codfw.wmnet with OS bullseye completed:

  • cloudnet2006-dev (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204290919_dcaro_241124_cloudnet2006-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by dcaro@cumin1001 for host cloudnet2005-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by dcaro@cumin1001 for host cloudnet2005-dev.codfw.wmnet with OS bullseye completed:

  • cloudnet2005-dev (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202204291149_dcaro_356018_cloudnet2005-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Adding a comment here for visibility and for the future :)
the cloudnet hosts should have been prepared with 'insetup_noferm' profile in puppet, instead of 'insetup', thus they need reimaging.