Bot managed by SRE for automated interaction with Phabricator from monitoring tools.
User Details
- User Since
- Aug 12 2016, 1:45 PM (345 w, 2 d)
- Roles
- Bot
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Unknown
Today
Fri, Mar 24
Icinga downtime and Alertmanager silence (ID=d3c0fbee-5db6-4389-b75e-415ed51c67bc) set by jmm@cumin2002 for 21 days, 0:00:00 on 1 host(s) and their services with reason: Non-functional, WIP for Bullseye update
krb2002.codfw.wmnet
Thu, Mar 23
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye completed:
- doc2002 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303232131_denisse_3186993_doc2002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
cookbooks.sre.hosts.decommission executed by denisse@cumin1001 for hosts: doc2002
- doc2002 (WARN)
- Host not found on Icinga, unable to downtime it
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main2002.codfw.wmnet with OS bullseye executed with errors:
- kafka-main2002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by jmm@cumin2002 for host irc1002.wikimedia.org with OS bullseye completed:
- irc1002 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303231503_jmm_1164999_irc1002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by jmm@cumin2002 for host irc1002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by jhathaway@cumin1001 for host lists1003.wikimedia.org with OS bullseye completed:
- lists1003 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303231429_jhathaway_3111141_lists1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by jhathaway@cumin1001 for host lists1003.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by sukhe@cumin2002 for host pybal-test2003.codfw.wmnet with OS bullseye completed:
- pybal-test2003 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303231355_sukhe_1115468_pybal-test2003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by sukhe@cumin2002 for host pybal-test2003.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by btullis@cumin1001 for host an-test-druid1001.eqiad.wmnet with OS bullseye completed:
- an-test-druid1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303231136_btullis_3072350_an-test-druid1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main2004.codfw.wmnet with OS bullseye completed:
- kafka-main2004 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303231108_elukey_3067129_kafka-main2004.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.ganeti.reimage was started by btullis@cumin1001 for host an-test-druid1001.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by jmm@cumin2002 for host irc2002.wikimedia.org with OS bullseye completed:
- irc2002 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303231044_jmm_947400_irc2002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main2004.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage was started by jmm@cumin2002 for host irc2002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main2005.codfw.wmnet with OS bullseye completed:
- kafka-main2005 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303231001_elukey_3053469_kafka-main2005.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main2005.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye executed with errors:
- an-test-client1002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303221402_stevemunene_2834991_an-test-client1002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye executed with errors:
- doc2002 (FAIL)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc2002.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by denisse@cumin1001 for host doc1003.eqiad.wmnet with OS bullseye completed:
- doc1003 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303222346_denisse_2941187_doc1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Wed, Mar 22
Cookbook cookbooks.sre.ganeti.reimage was started by denisse@cumin1001 for host doc1003.eqiad.wmnet with OS bullseye
cookbooks.sre.hosts.decommission executed by jhathaway@cumin1001 for hosts: dborch1002.wikimedia.org
- dborch1002.wikimedia.org (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
Cookbook cookbooks.sre.ganeti.reimage started by jhathaway@cumin1001 for host dborch1001.wikimedia.org with OS bullseye completed:
- dborch1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303221529_jhathaway_2852582_dborch1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by jhathaway@cumin1001 for host dborch1001.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage was started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main1004.eqiad.wmnet with OS bullseye completed:
- kafka-main1004 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303220938_elukey_2764303_kafka-main1004.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.ganeti.reimage started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye executed with errors:
- an-test-client1002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main1004.eqiad.wmnet with OS bullseye
Icinga downtime and Alertmanager silence (ID=dcc641f3-257f-4a0d-875d-85c9d542b7f8) set by jmm@cumin2002 for 3 days, 0:00:00 on 1 host(s) and their services with reason: Some tests with pybal/Bullseye
pybal-test2003.codfw.wmnet
Cookbook cookbooks.sre.ganeti.reimage was started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye
Tue, Mar 21
Cookbook cookbooks.sre.ganeti.reimage started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye executed with errors:
- an-test-client1002 (FAIL)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye executed with errors:
- an-test-client1002 (FAIL)
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.ganeti.reimage was started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye executed with errors:
- an-test-client1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by jhathaway@cumin1001 for host dborch1002.wikimedia.org with OS bullseye executed with errors:
- dborch1002 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303211852_jhathaway_2589599_dborch1002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage was started by jhathaway@cumin1001 for host dborch1002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage was started by stevemunene@cumin1001 for host an-test-client1002.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main1005.eqiad.wmnet with OS bullseye completed:
- kafka-main1005 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303211510_elukey_2520937_kafka-main1005.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main1005.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main1005.eqiad.wmnet with OS bullseye executed with errors:
- kafka-main1005 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- New OS is buster but bullseye was requested
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main1005.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by elukey@cumin1001 for host kafka-main1005.eqiad.wmnet with OS bullseye executed with errors:
- kafka-main1005 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by elukey@cumin1001 for host kafka-main1005.eqiad.wmnet with OS bullseye
Mon, Mar 20
Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye executed with errors:
- thanos-fe1004 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host ms-fe1013.eqiad.wmnet with OS bullseye executed with errors:
- ms-fe1013 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host ms-fe1013.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by jmm@cumin2002 for host cuminunpriv1001.eqiad.wmnet with OS bullseye completed:
- cuminunpriv1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303201317_jmm_2241129_cuminunpriv1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by jmm@cumin2002 for host cuminunpriv1001.eqiad.wmnet with OS bullseye
Fri, Mar 17
Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host ms-fe1013.eqiad.wmnet with OS bullseye executed with errors:
- ms-fe1013 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host ms-fe1013.eqiad.wmnet with OS bullseye
Thu, Mar 16
Cookbook cookbooks.sre.ganeti.reimage started by dzahn@cumin2002 for host miscweb2003.codfw.wmnet with OS bullseye completed:
- miscweb2003 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303162300_dzahn_2853858_miscweb2003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage started by dzahn@cumin1001 for host miscweb1003.eqiad.wmnet with OS bullseye completed:
- miscweb1003 (PASS)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303162301_dzahn_1167777_miscweb1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by dzahn@cumin1001 for host miscweb1003.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage was started by dzahn@cumin2002 for host miscweb2003.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye executed with errors:
- thanos-fe1004 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye executed with errors:
- thanos-fe1004 (FAIL)
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host thanos-fe1004.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by cmjohnson@cumin1001 for host ms-fe1013.eqiad.wmnet with OS bullseye
Icinga downtime and Alertmanager silence (ID=33992616-b446-4bc5-bf17-27cb8c47e8d7) set by cgoubert@cumin1001 for 1:00:00 on 32 host(s) and their services with reason: new_install
mw[2420-2451].codfw.wmnet
Icinga downtime and Alertmanager silence (ID=f7f64d19-c64a-4fb5-a8ab-f3218dfd9862) set by cgoubert@cumin1001 for 1:00:00 on 32 host(s) and their services with reason: new_install
mw[2420-2451].codfw.wmnet
Icinga downtime and Alertmanager silence (ID=17f33514-0b87-4f50-abfa-6cd2e1548410) set by cgoubert@cumin1001 for 5:00:00 on 32 host(s) and their services with reason: new_install
mw[2420-2451].codfw.wmnet
Icinga downtime and Alertmanager silence (ID=c5ba1cf2-f027-43f9-8672-b4eb30f98ddc) set by cgoubert@cumin1001 for 1:00:00 on 32 host(s) and their services with reason: new_install
mw[2420-2451].codfw.wmnet
cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db1105.eqiad.wmnet
- db1105.eqiad.wmnet (WARN)
- Downtimed host on Icinga/Alertmanager
- Found physical host
- Management interface not found on Icinga, unable to downtime it
- Wiped all swraid, partition-table and filesystem signatures
- Powered off
- [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
- Configured the linked switch interface(s)
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
Wed, Mar 15
Cookbook cookbooks.sre.ganeti.reimage started by brett@cumin2002 for host doh3002.wikimedia.org with OS bullseye completed:
- doh3002 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303151954_brett_1718980_doh3002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage started by brett@cumin2002 for host doh1002.wikimedia.org with OS bullseye completed:
- doh1002 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303151948_brett_1715541_doh1002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage started by brett@cumin2002 for host doh2002.wikimedia.org with OS bullseye completed:
- doh2002 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303151945_brett_1712110_doh2002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.hosts.reimage started by herron@cumin1001 for host kafka-logging1001.eqiad.wmnet with OS bullseye completed:
- kafka-logging1001 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303151932_herron_854627_kafka-logging1001.out
- Checked BIOS boot parameters are back to normal
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.ganeti.reimage was started by brett@cumin2002 for host doh3002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by brett@cumin2002 for host doh3001.wikimedia.org with OS bullseye completed:
- doh3001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303151914_brett_1685626_doh3001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by brett@cumin2002 for host doh1002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by brett@cumin2002 for host doh1001.wikimedia.org with OS bullseye completed:
- doh1001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303151917_brett_1687637_doh1001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by brett@cumin2002 for host doh2002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by brett@cumin2002 for host doh2001.wikimedia.org with OS bullseye completed:
- doh2001 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303151916_brett_1686927_doh2001.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage started by brett@cumin2002 for host doh6002.wikimedia.org with OS bullseye completed:
- doh6002 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303151905_brett_1678110_doh6002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.hosts.reimage was started by herron@cumin1001 for host kafka-logging1001.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage was started by brett@cumin2002 for host doh1001.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage was started by brett@cumin2002 for host doh2001.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage started by brett@cumin2002 for host doh5002.wikimedia.org with OS bullseye completed:
- doh5002 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present
- Deleted any existing Puppet certificate
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Set boot to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/ganeti/reimage/202303151819_brett_1643746_doh5002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
Cookbook cookbooks.sre.ganeti.reimage was started by brett@cumin2002 for host doh3001.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.ganeti.reimage was started by brett@cumin2002 for host doh6002.wikimedia.org with OS bullseye