Bot managed by SRE for automated interaction with Phabricator from monitoring tools.
User Details
- User Since
- Aug 12 2016, 1:45 PM (405 w, 2 d)
- Roles
- Bot
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- Unknown
Yesterday
Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host elastic2090.codfw.wmnet with OS bullseye completed:
- elastic2090 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405181836_ryankemper_2728901_elastic2090.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host elastic2090.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by ryankemper@cumin2002 for host elastic2090.codfw.wmnet with OS bullseye executed with errors:
- elastic2090 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" elastic2090.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by ryankemper@cumin2002 for host elastic2090.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye completed:
- kafka-main2009 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405172346_pt1979_1649208_kafka-main2009.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully
Fri, May 17
Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye executed with errors:
- kafka-main2009 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main2009.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye executed with errors:
- kafka-main1006 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main1006.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye executed with errors:
- kafka-main1006 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main1006.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye executed with errors:
- kafka-main2009 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main2009.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye executed with errors:
- kafka-main2009 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main2009.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye executed with errors:
- kafka-main2009 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main2009.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye
cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: kubestagetcd[1004-1006].eqiad.wmnet
- kubestagetcd1004.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: ldap-replica1006.wikimedia.org
- ldap-replica1006.wikimedia.org (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: ldap-replica1005.wikimedia.org
- ldap-replica1005.wikimedia.org (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
Icinga downtime and Alertmanager silence (ID=dd087345-70da-428c-8704-76433fe47872) set by jayme@cumin1002 for 2 days, 0:00:00 on 3 host(s) and their services with reason: decom
kubestagetcd[1004-1006].eqiad.wmnet
cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: kubestagemaster[1001-1002].eqiad.wmnet
- kubestagemaster1001.eqiad.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster eqiad to Netbox
cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: ldap-replica2008.wikimedia.org
- ldap-replica2008.wikimedia.org (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
Icinga downtime and Alertmanager silence (ID=d858a874-17ca-4ab5-8c9c-7fea35f1c823) set by jayme@cumin1002 for 2 days, 0:00:00 on 2 host(s) and their services with reason: decom
kubestagemaster[1001-1002].eqiad.wmnet
cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: ldap-replica2007.wikimedia.org
- ldap-replica2007.wikimedia.org (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
Host rebooted by btullis@cumin1002 with reason: Rebooting to pick up new kernel
Host rebooted by btullis@cumin1002 with reason: Rebooting to pick up new kernel
Thu, May 16
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin1002 for host contint2002.wikimedia.org with OS bullseye completed:
- contint2002 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405162014_dzahn_464740_contint2002.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin1002 for host contint2002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS buster
Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye executed with errors:
- kafka-main1006 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main1006.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS buster executed with errors:
- contint2002 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" contint2002.wikimedia.org to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS buster
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS bullseye executed with errors:
- contint2002 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" contint2002.wikimedia.org to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS bullseye executed with errors:
- contint2002 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" contint2002.wikimedia.org to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS bullseye executed with errors:
- contint2002 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" contint2002.wikimedia.org to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by dzahn@cumin2002 for host contint2002.wikimedia.org with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db2174.codfw.wmnet with OS bookworm completed:
- db2174 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405161428_arnaudb_418176_db2174.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db2174.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db2176.codfw.wmnet with OS bookworm completed:
- db2176 (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405161337_arnaudb_407694_db2176.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is not optimal, downtime not removed
- Updated Netbox data from PuppetDB
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1024.eqiad.wmnet with OS bookworm completed:
- es1024 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405161331_marostegui_407297_es1024.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db2176.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1024.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.wikireplicas.update-views started by fnegri completed:
- clouddb1021.eqiad.wmnet (PASS)
- Ran Puppet agent
- Ran 'maintain-views --all-databases --replace-all --auto-depool --table globaluser'
Cookbook cookbooks.sre.wikireplicas.update-views run by fnegri: Started updating wiki replica views
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host es1021.eqiad.wmnet with OS bookworm completed:
- es1021 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405160808_marostegui_361726_es1021.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host es1021.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye executed with errors:
- kafka-main2009 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Generated Puppet certificate
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main2009.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye
Wed, May 15
Cookbook cookbooks.sre.hosts.reimage started by vriley@cumin1002 for host kafka-main1007.eqiad.wmnet with OS bullseye executed with errors:
- kafka-main1007 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main1007.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by vriley@cumin1002 for host kafka-main1007.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye executed with errors:
- kafka-main2009 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main2009.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host kafka-main2010.codfw.wmnet with OS bullseye completed:
- kafka-main2010 (WARN)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405151426_jhancock_2468124_kafka-main2010.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host kafka-main2008.codfw.wmnet with OS bullseye completed:
- kafka-main2008 (WARN)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405151424_jhancock_2468073_kafka-main2008.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host kafka-main2007.codfw.wmnet with OS bullseye completed:
- kafka-main2007 (WARN)
- Downtimed on Icinga/Alertmanager
- Unable to disable Puppet, the host may have been unreachable
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405151422_jhancock_2467812_kafka-main2007.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host kafka-main2010.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host kafka-main2009.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host kafka-main2008.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host kafka-main2007.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host kafka-main2006.codfw.wmnet with OS bullseye completed:
- kafka-main2006 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405151325_jhancock_2409379_kafka-main2006.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Updated Netbox status planned -> active
- The sre.puppet.sync-netbox-hiera cookbook was run successfully
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host kafka-main2006.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host kafka-main2006.codfw.wmnet with OS bullseye executed with errors:
- kafka-main2006 (FAIL)
- Downtimed on Icinga/Alertmanager
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main2006.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host kafka-main2006.codfw.wmnet with OS bullseye
cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: kubestagetcd[2001-2003].codfw.wmnet
- kubestagetcd2001.codfw.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
Icinga downtime and Alertmanager silence (ID=5c048aeb-57ce-4f8d-8159-53dcf8b5fb78) set by jayme@cumin1002 for 2 days, 0:00:00 on 3 host(s) and their services with reason: decom
kubestagetcd[2001-2003].codfw.wmnet
Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm executed with errors:
- cloudvirt1041 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cloudvirt1041.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm executed with errors:
- cloudvirt1041 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" cloudvirt1041.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm
cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: kubestagemaster[2001-2002].codfw.wmnet
- kubestagemaster2001.codfw.wmnet (PASS)
- Downtimed host on Icinga/Alertmanager
- Found Ganeti VM
- VM shutdown
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
- Removed from DebMonitor
- Removed from Puppet master and PuppetDB
- VM removed
- Started forced sync of VMs in Ganeti cluster codfw to Netbox
Icinga downtime and Alertmanager silence (ID=be009031-0cc0-4a4d-97a0-f4d990831efe) set by jmm@cumin2002 for 1:00:00 on 1 host(s) and their services with reason: OS update
seaborgium.wikimedia.org
Tue, May 14
Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1006.eqiad.wmnet with OS bullseye
Deployed homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.5 update to add modified wmf homer plugin - cmooney@cumin1002 - T364480
Icinga downtime and Alertmanager silence (ID=6e2580b0-999e-4a68-87e7-c37d374c663f) set by aokoth@cumin1002 for 0:30:00 on 1 host(s) and their services with reason: Phorge update
phab1004.eqiad.wmnet
VM kubestagemaster1005.eqiad.wmnet switching disk type to plain
VM kubestagemaster1004.eqiad.wmnet switching disk type to plain
VM kubestagemaster1003.eqiad.wmnet switching disk type to plain
Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster1005.eqiad.wmnet with OS bullseye completed:
- kubestagemaster1005 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405140931_jayme_4192981_kubestagemaster1005.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster1004.eqiad.wmnet with OS bullseye completed:
- kubestagemaster1004 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405140906_jayme_4192644_kubestagemaster1004.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster1003.eqiad.wmnet with OS bullseye completed:
- kubestagemaster1003 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405140904_jayme_4192355_kubestagemaster1003.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster1005.eqiad.wmnet with OS bullseye
Icinga downtime and Alertmanager silence (ID=34ac3b76-436c-436c-afc2-20387cde43fb) set by jmm@cumin2002 for 1:00:00 on 1 host(s) and their services with reason: OS update
serpens.wikimedia.org
Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster1004.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster1003.eqiad.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2005.codfw.wmnet with OS bullseye executed with errors:
- kubestagemaster2005 (FAIL)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405131705_jayme_4072335_kubestagemaster2005.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kubestagemaster2005.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2185.codfw.wmnet with OS bookworm completed:
- db2185 (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405140656_marostegui_4175353_db2185.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2185.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db2185.codfw.wmnet with OS bookworm executed with errors:
- db2185 (FAIL)
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" db2185.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db2185.codfw.wmnet with OS bookworm
Mon, May 13
Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2005.codfw.wmnet with OS bullseye
Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2005.codfw.wmnet with OS bullseye executed with errors:
- kubestagemaster2005 (FAIL)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405131442_jayme_4048520_kubestagemaster2005.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
- The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kubestagemaster2005.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Cookbook cookbooks.sre.hosts.reimage started by jayme@cumin1002 for host kubestagemaster2004.codfw.wmnet with OS bullseye completed:
- kubestagemaster2004 (PASS)
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via gnt-instance
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Set boot media to disk
- Host up (new fresh bullseye OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405131435_jayme_4048471_kubestagemaster2004.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage was started by jayme@cumin1002 for host kubestagemaster2005.codfw.wmnet with OS bullseye