Page MenuHomePhabricator

ops-monitoring-bot (Operations Monitoring Bot)
UserBot

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Aug 12 2016, 1:45 PM (407 w, 2 d)
Roles
Bot
Availability
Available
LDAP User
Unknown
MediaWiki User
Unknown

Bot managed by SRE for automated interaction with Phabricator from monitoring tools.

Recent Activity

Today

ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc-gp1001.eqiad.wmnet with OS bookworm

Mon, Jun 3, 12:45 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc-gp1002.eqiad.wmnet with OS bookworm completed:

  • mc-gp1002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406031224_jiji_270944_mc-gp1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Jun 3, 12:41 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc-gp1002.eqiad.wmnet with OS bookworm

Mon, Jun 3, 12:06 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1037.eqiad.wmnet with OS bookworm completed:

  • mc1037 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406031107_jiji_262393_mc1037.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Jun 3, 11:24 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1037.eqiad.wmnet with OS bookworm

Mon, Jun 3, 10:50 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1038.eqiad.wmnet with OS bookworm completed:

  • mc1038 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406031021_jiji_254118_mc1038.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Jun 3, 10:40 AM · serviceops
ops-monitoring-bot added a comment to T325228: Migrate Dumps Snapshot hosts from Buster to Bullseye.

Cookbook cookbooks.sre.hosts.reimage started by btullis@cumin1002 for host snapshot1013.eqiad.wmnet with OS bullseye completed:

  • snapshot1013 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406031002_btullis_250840_snapshot1013.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Jun 3, 10:29 AM · Patch-For-Review, Data-Platform-SRE (2024.05.27 - 2024.06.16), SRE, Data-Engineering, Dumps-Generation
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1038.eqiad.wmnet with OS bookworm

Mon, Jun 3, 10:03 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc-gp2001.codfw.wmnet with OS bookworm executed with errors:

  • mc-gp2001 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406030931_jiji_3618044_mc-gp2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" mc-gp2001.codfw.wmnet to get a root shellbut depending on the failure this may not work.
Mon, Jun 3, 9:50 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc-gp2001.codfw.wmnet with OS bookworm completed:

  • mc-gp2001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406030931_jiji_3618044_mc-gp2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Mon, Jun 3, 9:50 AM · serviceops
ops-monitoring-bot added a comment to T325228: Migrate Dumps Snapshot hosts from Buster to Bullseye.

Cookbook cookbooks.sre.hosts.reimage was started by btullis@cumin1002 for host snapshot1013.eqiad.wmnet with OS bullseye

Mon, Jun 3, 9:44 AM · Patch-For-Review, Data-Platform-SRE (2024.05.27 - 2024.06.16), SRE, Data-Engineering, Dumps-Generation
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1039.eqiad.wmnet with OS bookworm completed:

  • mc1039 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406030925_jiji_245315_mc1039.out
    • Unable to run puppet on config-master2001.codfw.wmnet,config-master1001.eqiad.wmnet to update configmaster.wikimedia.org with the new host SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Jun 3, 9:41 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc-gp2001.codfw.wmnet with OS bookworm

Mon, Jun 3, 9:10 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1039.eqiad.wmnet with OS bookworm

Mon, Jun 3, 9:10 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc-gp2002.codfw.wmnet with OS bookworm completed:

  • mc-gp2002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406030831_jiji_3559775_mc-gp2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Mon, Jun 3, 8:50 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc-gp1003.eqiad.wmnet with OS bookworm completed:

  • mc-gp1003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406030828_jiji_236110_mc-gp1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Mon, Jun 3, 8:45 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc-gp1003.eqiad.wmnet with OS bookworm

Mon, Jun 3, 8:10 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc-gp2002.codfw.wmnet with OS bookworm

Mon, Jun 3, 8:09 AM · serviceops

Sat, Jun 1

ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye completed:

  • kafka-main1010 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202406010720_akosiaris_4078484_kafka-main1010.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Sat, Jun 1, 1:39 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Sat, Jun 1, 6:59 AM · SRE, serviceops, ops-eqiad, DC-Ops

Fri, May 31

ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Fri, May 31, 1:37 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363792: Upgrade s8 to MariaDB 10.6.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1209.eqiad.wmnet with OS bookworm completed:

  • db1209 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405311209_marostegui_3930866_db1209.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, May 31, 12:30 PM · DBA
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2038.codfw.wmnet with OS bookworm completed:

  • mc2038 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405311203_jiji_3776530_mc2038.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Fri, May 31, 12:21 PM · serviceops
ops-monitoring-bot added a comment to T363792: Upgrade s8 to MariaDB 10.6.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1209.eqiad.wmnet with OS bookworm

Fri, May 31, 11:51 AM · DBA
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1039.eqiad.wmnet with OS bookworm executed with errors:

  • mc1039 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • New OS is bullseye but bookworm was requested
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" mc1039.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Fri, May 31, 11:47 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2038.codfw.wmnet with OS bookworm

Fri, May 31, 11:42 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2039.codfw.wmnet with OS bookworm completed:

  • mc2039 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405311109_jiji_3721618_mc2039.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Fri, May 31, 11:27 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2039.codfw.wmnet with OS bookworm

Fri, May 31, 10:47 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1039.eqiad.wmnet with OS bookworm

Fri, May 31, 10:47 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2040.codfw.wmnet with OS bookworm completed:

  • mc2040 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405311003_jiji_3657011_mc2040.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Fri, May 31, 10:21 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1040.eqiad.wmnet with OS bookworm completed:

  • mc1040 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405310958_jiji_3912762_mc1040.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, May 31, 10:14 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2040.codfw.wmnet with OS bookworm

Fri, May 31, 9:42 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1040.eqiad.wmnet with OS bookworm

Fri, May 31, 9:42 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2041.codfw.wmnet with OS bookworm completed:

  • mc2041 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405310908_jiji_3601887_mc2041.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Fri, May 31, 9:25 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1041.eqiad.wmnet with OS bookworm completed:

  • mc1041 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405310903_jiji_3902367_mc1041.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, May 31, 9:19 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2041.codfw.wmnet with OS bookworm

Fri, May 31, 8:47 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1041.eqiad.wmnet with OS bookworm

Fri, May 31, 8:47 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2042.codfw.wmnet with OS bookworm completed:

  • mc2042 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405310802_jiji_3536165_mc2042.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Fri, May 31, 8:20 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1042.eqiad.wmnet with OS bookworm completed:

  • mc1042 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405310756_jiji_3893081_mc1042.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, May 31, 8:13 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2042.codfw.wmnet with OS bookworm

Fri, May 31, 7:40 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1042.eqiad.wmnet with OS bookworm

Fri, May 31, 7:40 AM · serviceops
ops-monitoring-bot added a comment to T279621: Set up Misc Object Storage Service (moss).

Icinga downtime and Alertmanager silence (ID=418e7ac9-ef32-4b52-9f19-645af27090e2) set by mvernon@cumin1002 for 7 days, 0:00:00 on 1 host(s) and their services with reason: in development

moss-fe1002.eqiad.wmnet
Fri, May 31, 7:31 AM · SRE-swift-storage
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2043.codfw.wmnet with OS bookworm completed:

  • mc2043 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405310641_jiji_3456016_mc2043.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Fri, May 31, 6:58 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1043.eqiad.wmnet with OS bookworm completed:

  • mc1043 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405310636_jiji_3881045_mc1043.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Fri, May 31, 6:53 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1043.eqiad.wmnet with OS bookworm

Fri, May 31, 6:20 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2043.codfw.wmnet with OS bookworm

Fri, May 31, 6:20 AM · serviceops

Thu, May 30

ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage started by akosiaris@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye completed:

  • kafka-main1009 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405301558_akosiaris_3731074_kafka-main1009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Thu, May 30, 4:15 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by akosiaris@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye

Thu, May 30, 3:38 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Thu, May 30, 2:05 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye executed with errors:

  • kafka-main1010 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main1010.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Thu, May 30, 1:46 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Thu, May 30, 1:45 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2044.codfw.wmnet with OS bookworm completed:

  • mc2044 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405301325_jiji_2436033_mc2044.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Thu, May 30, 1:43 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1044.eqiad.wmnet with OS bookworm completed:

  • mc1044 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405301320_jiji_3707785_mc1044.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Thu, May 30, 1:38 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1044.eqiad.wmnet with OS bookworm

Thu, May 30, 1:04 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2044.codfw.wmnet with OS bookworm

Thu, May 30, 1:04 PM · serviceops
ops-monitoring-bot added a comment to T364296: Reimage db1215 and db2185 (zarcillo) to bookworm.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1002 for host db1215.eqiad.wmnet with OS bookworm completed:

  • db1215 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405300905_marostegui_3669917_db1215.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Thu, May 30, 9:21 AM · DBA
ops-monitoring-bot added a comment to T364296: Reimage db1215 and db2185 (zarcillo) to bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1002 for host db1215.eqiad.wmnet with OS bookworm

Thu, May 30, 8:47 AM · DBA

Wed, May 29

ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye completed:

  • kafka-main1009 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202405292206_jclark_3589235_kafka-main1009.out, asking the operator what to do
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405292206_jclark_3589235_kafka-main1009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 10:17 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye completed:

  • kafka-main1009 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291635_jclark_3540615_kafka-main1009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually
Wed, May 29, 10:05 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Wed, May 29, 10:01 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye

Wed, May 29, 10:00 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T364984: cloudvirt1041: can't boot after reimage.

Cookbook cookbooks.sre.hosts.reimage started by andrew@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1041 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291726_andrew_3548421_cloudvirt1041.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status failed -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, May 29, 6:32 PM · DC-Ops, ops-eqiad, User-aborrero, cloud-services-team, SRE
ops-monitoring-bot added a comment to T364984: cloudvirt1041: can't boot after reimage.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm

Wed, May 29, 5:08 PM · DC-Ops, ops-eqiad, User-aborrero, cloud-services-team, SRE
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2045.codfw.wmnet with OS bookworm completed:

  • mc2045 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291622_jiji_1210815_mc2045.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Wed, May 29, 4:40 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1045.eqiad.wmnet with OS bookworm completed:

  • mc1045 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291617_jiji_3536831_mc1045.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 4:34 PM · serviceops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye

Wed, May 29, 4:30 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Wed, May 29, 4:28 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1045.eqiad.wmnet with OS bookworm

Wed, May 29, 4:01 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2045.codfw.wmnet with OS bookworm

Wed, May 29, 4:01 PM · serviceops
ops-monitoring-bot added a comment to T364984: cloudvirt1041: can't boot after reimage.

Cookbook cookbooks.sre.hosts.reimage was started by andrew@cumin1002 for host cloudvirt1041.eqiad.wmnet with OS bookworm

Wed, May 29, 3:55 PM · DC-Ops, ops-eqiad, User-aborrero, cloud-services-team, SRE
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye executed with errors:

  • kafka-main1009 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main1009.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Wed, May 29, 3:55 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye

Wed, May 29, 3:09 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T364290: Upgrade s1 to MariaDB 10.6.

Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db1163.eqiad.wmnet with OS bookworm completed:

  • db1163 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291447_arnaudb_3519736_db1163.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 3:08 PM · DBA
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2046.codfw.wmnet with OS bookworm completed:

  • mc2046 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291449_jiji_1115869_mc2046.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Wed, May 29, 3:07 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1046.eqiad.wmnet with OS bookworm completed:

  • mc1046 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291442_jiji_3517157_mc1046.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 2:58 PM · serviceops
ops-monitoring-bot added a comment to T364290: Upgrade s1 to MariaDB 10.6.

Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db1163.eqiad.wmnet with OS bookworm

Wed, May 29, 2:30 PM · DBA
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Wed, May 29, 2:27 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2046.codfw.wmnet with OS bookworm

Wed, May 29, 2:26 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1046.eqiad.wmnet with OS bookworm

Wed, May 29, 2:26 PM · serviceops
ops-monitoring-bot added a comment to T364290: Upgrade s1 to MariaDB 10.6.

Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db1169.eqiad.wmnet with OS bookworm completed:

  • db1169 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291346_arnaudb_3509654_db1169.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 2:08 PM · DBA
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2047.codfw.wmnet with OS bookworm completed:

  • mc2047 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291316_jiji_1024686_mc2047.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 1:36 PM · serviceops
ops-monitoring-bot added a comment to T364290: Upgrade s1 to MariaDB 10.6.

Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db1169.eqiad.wmnet with OS bookworm

Wed, May 29, 1:30 PM · DBA
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1047.eqiad.wmnet with OS bookworm completed:

  • mc1047 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291310_jiji_3499598_mc1047.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 1:27 PM · serviceops
ops-monitoring-bot added a comment to T364290: Upgrade s1 to MariaDB 10.6.

Cookbook cookbooks.sre.hosts.reimage started by arnaudb@cumin1002 for host db1196.eqiad.wmnet with OS bookworm completed:

  • db1196 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291304_arnaudb_3499111_db1196.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (EVPN Switch)
Wed, May 29, 1:26 PM · DBA
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2047.codfw.wmnet with OS bookworm

Wed, May 29, 12:55 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1047.eqiad.wmnet with OS bookworm

Wed, May 29, 12:54 PM · serviceops
ops-monitoring-bot added a comment to T364290: Upgrade s1 to MariaDB 10.6.

Cookbook cookbooks.sre.hosts.reimage was started by arnaudb@cumin1002 for host db1196.eqiad.wmnet with OS bookworm

Wed, May 29, 12:46 PM · DBA
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin2002 for host mc2048.codfw.wmnet with OS bookworm completed:

  • mc2048 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291228_jiji_975311_mc2048.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 12:45 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1048.eqiad.wmnet with OS bookworm completed:

  • mc1048 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291224_jiji_3492876_mc1048.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 12:40 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1048.eqiad.wmnet with OS bookworm

Wed, May 29, 12:05 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin2002 for host mc2048.codfw.wmnet with OS bookworm

Wed, May 29, 12:04 PM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage started by jiji@cumin1002 for host mc1049.eqiad.wmnet with OS bookworm completed:

  • mc1049 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202405291046_jiji_3479252_mc1049.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, May 29, 11:03 AM · serviceops
ops-monitoring-bot added a comment to T352891: Upgrade memcache and memcached gutter pools to Bookworm.

Cookbook cookbooks.sre.hosts.reimage was started by jiji@cumin1002 for host mc1049.eqiad.wmnet with OS bookworm

Wed, May 29, 10:29 AM · serviceops
ops-monitoring-bot added a comment to T366094: k8s master capacity issues.

Icinga downtime and Alertmanager silence (ID=2f1b90d9-2cd4-4705-bbf1-70fdacf169cd) set by akosiaris@cumin1002 for 2:00:00 on 1 host(s) and their services with reason: disable puppet and k8s controlplane

kubemaster1002.eqiad.wmnet
Wed, May 29, 10:17 AM · serviceops, SRE
ops-monitoring-bot added a comment to T366094: k8s master capacity issues.

Icinga downtime and Alertmanager silence (ID=8fa8366a-d3f2-4a77-8e2b-45de66551026) set by akosiaris@cumin1002 for 2:00:00 on 1 host(s) and their services with reason: disable puppet and k8s controlplane

wikikube-ctrl1002.eqiad.wmnet
Wed, May 29, 10:05 AM · serviceops, SRE
ops-monitoring-bot added a comment to T366094: k8s master capacity issues.

Icinga downtime and Alertmanager silence (ID=b65d2df8-871b-4064-b329-026af4d7ec1d) set by akosiaris@cumin1002 for 2:00:00 on 1 host(s) and their services with reason: disable puppet and k8s controlplane

wikikube-ctrl1001.eqiad.wmnet
Wed, May 29, 10:05 AM · serviceops, SRE

Tue, May 28

ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye executed with errors:

  • kafka-main1009 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details,You can also try typing "install-console" kafka-main1009.eqiad.wmnet to get a root shellbut depending on the failure this may not work.
Tue, May 28, 7:36 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Tue, May 28, 7:19 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1009.eqiad.wmnet with OS bullseye

Tue, May 28, 6:50 PM · SRE, serviceops, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T363212: Q4:rack/setup/install kafka-main100[6789] and kafka-main1010.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1002 for host kafka-main1010.eqiad.wmnet with OS bullseye

Tue, May 28, 6:18 PM · SRE, serviceops, ops-eqiad, DC-Ops