Page MenuHomePhabricator

ops-monitoring-bot (Operations Monitoring Bot)
UserBot

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Aug 12 2016, 1:45 PM (306 w, 5 d)
Roles
Bot
Availability
Available
LDAP User
Unknown
MediaWiki User
Unknown

Bot managed by SRE for automated interaction with Phabricator from monitoring tools.

Recent Activity

Today

ops-monitoring-bot added a comment to T302937: datadumps1007 test installs.

Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye

Wed, Jun 29, 5:19 PM · Patch-For-Review, SRE, DC-Ops
ops-monitoring-bot added a comment to T302937: datadumps1007 test installs.

Cookbook cookbooks.sre.hosts.reimage started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye executed with errors:

  • dumpsdata1007 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details
Wed, Jun 29, 5:18 PM · Patch-For-Review, SRE, DC-Ops
ops-monitoring-bot added a comment to T302937: datadumps1007 test installs.

Cookbook cookbooks.sre.hosts.reimage was started by robh@cumin1001 for host dumpsdata1007.eqiad.wmnet with OS bullseye

Wed, Jun 29, 5:04 PM · Patch-For-Review, SRE, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1032.eqiad.wmnet with OS buster completed:

  • cloudcephosd1032 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206291354_cmjohnson_753471_cloudcephosd1032.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Wed, Jun 29, 2:49 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1033.eqiad.wmnet with OS buster completed:

  • cloudcephosd1033 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206291355_cmjohnson_754720_cloudcephosd1033.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Wed, Jun 29, 2:40 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1029.eqiad.wmnet with OS buster completed:

  • cloudcephosd1029 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206291350_cmjohnson_751960_cloudcephosd1029.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Wed, Jun 29, 2:36 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1034.eqiad.wmnet with OS buster completed:

  • cloudcephosd1034 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206291355_cmjohnson_754780_cloudcephosd1034.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Wed, Jun 29, 2:30 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1030.eqiad.wmnet with OS buster completed:

  • cloudcephosd1030 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206291351_cmjohnson_752151_cloudcephosd1030.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Wed, Jun 29, 2:25 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1028.eqiad.wmnet with OS buster completed:

  • cloudcephosd1028 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206291348_cmjohnson_751753_cloudcephosd1028.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Wed, Jun 29, 2:24 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1026.eqiad.wmnet with OS buster completed:

  • cloudcephosd1026 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206291320_cmjohnson_744652_cloudcephosd1026.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Wed, Jun 29, 2:14 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T307399: Q4: rack/setup/install stat1010.

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1001 for host stat1010.eqiad.wmnet with OS bullseye

Wed, Jun 29, 2:12 PM · Patch-For-Review, Data-Engineering-Kanban, SRE, Data-Engineering, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T307399: Q4: rack/setup/install stat1010.

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1001 for host stat1010.eqiad.wmnet with OS bullseye executed with errors:

  • stat1010 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details
Wed, Jun 29, 2:05 PM · Patch-For-Review, Data-Engineering-Kanban, SRE, Data-Engineering, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1031.eqiad.wmnet with OS buster executed with errors:

  • cloudcephosd1031 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details
Wed, Jun 29, 2:00 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1027.eqiad.wmnet with OS buster completed:

  • cloudcephosd1027 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206291307_cmjohnson_743095_cloudcephosd1027.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Wed, Jun 29, 1:58 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1034.eqiad.wmnet with OS buster

Wed, Jun 29, 1:55 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1033.eqiad.wmnet with OS buster

Wed, Jun 29, 1:55 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1032.eqiad.wmnet with OS buster

Wed, Jun 29, 1:55 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1031.eqiad.wmnet with OS buster

Wed, Jun 29, 1:53 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1030.eqiad.wmnet with OS buster

Wed, Jun 29, 1:51 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1029.eqiad.wmnet with OS buster

Wed, Jun 29, 1:50 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1028.eqiad.wmnet with OS buster

Wed, Jun 29, 1:49 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1026.eqiad.wmnet with OS buster

Wed, Jun 29, 1:20 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T311623: decommission db2081.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2081.codfw.wmnet

  • db2081.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Icinga/Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Wed, Jun 29, 1:18 PM · SRE, ops-codfw, decommission-hardware
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1027.eqiad.wmnet with OS buster

Wed, Jun 29, 1:07 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host cloudcephosd1025.eqiad.wmnet with OS buster executed with errors:

  • cloudcephosd1025 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details
Wed, Jun 29, 1:06 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T294972: Q2:(Need By: TBD) rack/setup/install cloudcephosd10[25-34].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host cloudcephosd1025.eqiad.wmnet with OS buster

Wed, Jun 29, 1:03 PM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T311591: decommission db2075.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2075.codfw.wmnet

  • db2075.codfw.wmnet (FAIL)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Icinga/Alertmanager
    • Failed to wipe swraid, partition-table and filesystem signatures, manual intervention required to make it unbootable: Cumin execution failed (exit_code=2)
    • Powered off
    • Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Wed, Jun 29, 7:54 AM · SRE, ops-codfw, decommission-hardware
ops-monitoring-bot added a comment to T305460: Upgrade webperf hosts to Bullseye.

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: webperf1002.eqiad.wmnet

  • webperf1002.eqiad.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster eqiad to Netbox
Wed, Jun 29, 7:34 AM · Patch-For-Review, Performance-Team, SRE
ops-monitoring-bot added a comment to T311589: decommission db2071.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2071.codfw.wmnet

  • db2071.codfw.wmnet (FAIL)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Icinga/Alertmanager
    • Failed to wipe swraid, partition-table and filesystem signatures, manual intervention required to make it unbootable: Cumin execution failed (exit_code=2)
    • Powered off
    • Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Wed, Jun 29, 7:17 AM · SRE, ops-codfw, decommission-hardware
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudgw2003-dev.codfw.wmnet with OS bullseye completed:

  • cloudgw2003-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206282320_pt1979_1468378_cloudgw2003-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Wed, Jun 29, 12:18 AM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops

Yesterday

ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudgw2003-dev.codfw.wmnet with OS bullseye

Tue, Jun 28, 11:20 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host clouddb2002-dev.codfw.wmnet with OS bullseye completed:

  • clouddb2002-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206282227_pt1979_1460329_clouddb2002-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 11:17 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host clouddb2002-dev.codfw.wmnet with OS bullseye

Tue, Jun 28, 10:27 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host clouddb2002-dev.codfw.wmnet with OS bullseye executed with errors:

  • clouddb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details
Tue, Jun 28, 10:20 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host clouddb2002-dev.codfw.wmnet with OS bullseye

Tue, Jun 28, 9:31 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host clouddb2002-dev.codfw.wmnet with OS bullseye executed with errors:

  • clouddb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details
Tue, Jun 28, 9:30 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host clouddb2002-dev.codfw.wmnet with OS bullseye

Tue, Jun 28, 9:08 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T296452: Upgrade Netbox to 3.2.

cookbooks.sre.hosts.decommission executed by volans@cumin2002 for hosts: sretest2001.codfw.wmnet

  • sretest2001.codfw.wmnet (WARN)
    • Host not found on Icinga, unable to downtime it
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw_test to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw_test to Netbox
Tue, Jun 28, 9:07 PM · Patch-For-Review, Infrastructure-Foundations, netbox
ops-monitoring-bot added a comment to T305460: Upgrade webperf hosts to Bullseye.

cookbooks.sre.hosts.decommission executed by volans@cumin2002 for hosts: webperf2002.codfw.wmnet

  • webperf2002.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed
    • Started forced sync of VMs in Ganeti cluster codfw to Netbox
Tue, Jun 28, 8:23 PM · Patch-For-Review, Performance-Team, SRE
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudcontrol2005-dev.wikimedia.org with OS bullseye completed:

  • cloudcontrol2005-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281922_pt1979_1421692_cloudcontrol2005-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 7:56 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T306854: Q4: (Need By: TBD) rack/setup/install cloudcontrol2005-dev, clouddb2002-dev, cloudgw2003-dev.

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudcontrol2005-dev.wikimedia.org with OS bullseye

Tue, Jun 28, 7:22 PM · SRE, cloud-services-team (Hardware), ops-codfw, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1497.eqiad.wmnet with OS buster completed:

  • mw1497 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281816_cmjohnson_417999_mw1497.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 7:06 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1495.eqiad.wmnet with OS buster completed:

  • mw1495 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281808_cmjohnson_414041_mw1495.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 7:06 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1492.eqiad.wmnet with OS buster completed:

  • mw1492 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281755_cmjohnson_405147_mw1492.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 6:56 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1498.eqiad.wmnet with OS buster completed:

  • mw1498 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281817_cmjohnson_418078_mw1498.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 6:49 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1494.eqiad.wmnet with OS buster completed:

  • mw1494 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281807_cmjohnson_413995_mw1494.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 6:45 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1493.eqiad.wmnet with OS buster completed:

  • mw1493 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281759_cmjohnson_408593_mw1493.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 6:35 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1488.eqiad.wmnet with OS buster completed:

  • mw1488 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281743_cmjohnson_401395_mw1488.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 6:31 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1491.eqiad.wmnet with OS buster completed:

  • mw1491 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281744_cmjohnson_401627_mw1491.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 6:29 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1490.eqiad.wmnet with OS buster completed:

  • mw1490 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281744_cmjohnson_401548_mw1490.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 6:27 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1487.eqiad.wmnet with OS buster completed:

  • mw1487 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281739_cmjohnson_400809_mw1487.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 6:27 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1485.eqiad.wmnet with OS buster completed:

  • mw1485 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281734_cmjohnson_400099_mw1485.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 6:26 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1498.eqiad.wmnet with OS buster executed with errors:

  • mw1498 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details
Tue, Jun 28, 6:21 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1498.eqiad.wmnet with OS buster

Tue, Jun 28, 6:21 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1498.eqiad.wmnet with OS buster executed with errors:

  • mw1498 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details
Tue, Jun 28, 6:19 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1498.eqiad.wmnet with OS buster

Tue, Jun 28, 6:19 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1489.eqiad.wmnet with OS buster completed:

  • mw1489 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281743_cmjohnson_401453_mw1489.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Tue, Jun 28, 6:18 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1498.eqiad.wmnet with OS buster

Tue, Jun 28, 6:17 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1497.eqiad.wmnet with OS buster

Tue, Jun 28, 6:16 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1486.eqiad.wmnet with OS buster completed:

  • mw1486 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281734_cmjohnson_400214_mw1486.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 6:11 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1472.eqiad.wmnet with OS buster completed:

  • mw1472 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281705_cmjohnson_378965_mw1472.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 6:11 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1495.eqiad.wmnet with OS buster

Tue, Jun 28, 6:08 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1494.eqiad.wmnet with OS buster

Tue, Jun 28, 6:08 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1478.eqiad.wmnet with OS buster completed:

  • mw1478 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281700_cmjohnson_377845_mw1478.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 6:07 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1480.eqiad.wmnet with OS buster completed:

  • mw1480 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281701_cmjohnson_378031_mw1480.out
    • Checked BIOS boot parameters are back to normal
    • Unable to run puppet on puppetmaster2001.codfw.wmnet,puppetmaster1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 6:01 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1479.eqiad.wmnet with OS buster completed:

  • mw1479 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281701_cmjohnson_377954_mw1479.out
    • Checked BIOS boot parameters are back to normal
    • Unable to run puppet on puppetmaster2001.codfw.wmnet,puppetmaster1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 6:00 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1493.eqiad.wmnet with OS buster

Tue, Jun 28, 6:00 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1476.eqiad.wmnet with OS buster completed:

  • mw1476 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281659_cmjohnson_377692_mw1476.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 5:58 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1492.eqiad.wmnet with OS buster

Tue, Jun 28, 5:55 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1473.eqiad.wmnet with OS buster completed:

  • mw1473 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281659_cmjohnson_377555_mw1473.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 5:54 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1491.eqiad.wmnet with OS buster

Tue, Jun 28, 5:44 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1490.eqiad.wmnet with OS buster

Tue, Jun 28, 5:44 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1470.eqiad.wmnet with OS buster completed:

  • mw1470 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281657_cmjohnson_377116_mw1470.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 5:44 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1489.eqiad.wmnet with OS buster

Tue, Jun 28, 5:43 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1488.eqiad.wmnet with OS buster

Tue, Jun 28, 5:43 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1474.eqiad.wmnet with OS buster completed:

  • mw1474 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281659_cmjohnson_377611_mw1474.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 5:43 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1471.eqiad.wmnet with OS buster completed:

  • mw1471 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281657_cmjohnson_377261_mw1471.out
    • Checked BIOS boot parameters are back to normal
    • Unable to run puppet on puppetmaster2001.codfw.wmnet,puppetmaster1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 5:42 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1483.eqiad.wmnet with OS buster completed:

  • mw1483 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281702_cmjohnson_378324_mw1483.out
    • Checked BIOS boot parameters are back to normal
    • Unable to run puppet on puppetmaster2001.codfw.wmnet,puppetmaster1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 5:40 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1487.eqiad.wmnet with OS buster

Tue, Jun 28, 5:39 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1481.eqiad.wmnet with OS buster completed:

  • mw1481 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281702_cmjohnson_378136_mw1481.out
    • Checked BIOS boot parameters are back to normal
    • Unable to run puppet on puppetmaster2001.codfw.wmnet,puppetmaster1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 5:37 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1486.eqiad.wmnet with OS buster

Tue, Jun 28, 5:34 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1485.eqiad.wmnet with OS buster

Tue, Jun 28, 5:34 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1482.eqiad.wmnet with OS buster completed:

  • mw1482 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281702_cmjohnson_378230_mw1482.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 5:33 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1484.eqiad.wmnet with OS buster executed with errors:

  • mw1484 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281703_cmjohnson_378483_mw1484.out
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details
Tue, Jun 28, 5:26 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1472.eqiad.wmnet with OS buster

Tue, Jun 28, 5:05 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1484.eqiad.wmnet with OS buster

Tue, Jun 28, 5:03 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1483.eqiad.wmnet with OS buster

Tue, Jun 28, 5:03 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1482.eqiad.wmnet with OS buster

Tue, Jun 28, 5:02 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1481.eqiad.wmnet with OS buster

Tue, Jun 28, 5:02 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1480.eqiad.wmnet with OS buster

Tue, Jun 28, 5:01 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1479.eqiad.wmnet with OS buster

Tue, Jun 28, 5:01 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1478.eqiad.wmnet with OS buster

Tue, Jun 28, 5:00 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1476.eqiad.wmnet with OS buster

Tue, Jun 28, 5:00 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1474.eqiad.wmnet with OS buster

Tue, Jun 28, 4:59 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1473.eqiad.wmnet with OS buster

Tue, Jun 28, 4:59 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1471.eqiad.wmnet with OS buster

Tue, Jun 28, 4:58 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host mw1470.eqiad.wmnet with OS buster

Tue, Jun 28, 4:57 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1462.eqiad.wmnet with OS buster completed:

  • mw1462 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281341_cmjohnson_333209_mw1462.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 2:39 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1468.eqiad.wmnet with OS buster completed:

  • mw1468 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281341_cmjohnson_333204_mw1468.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 2:32 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T306121: Q4: (Need By: TBD) rack/setup/install mw14[57-98].

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host mw1464.eqiad.wmnet with OS buster completed:

  • mw1464 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202206281341_cmjohnson_333212_mw1464.out
    • Checked BIOS boot parameters are back to normal
    • Unable to run puppet on puppetmaster2001.codfw.wmnet,puppetmaster1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Tue, Jun 28, 2:29 PM · SRE, serviceops, ops-eqiad, DC-Ops