Page MenuHomePhabricator

ops-monitoring-bot (Operations Monitoring Bot)
UserBot

Projects (3)

Today

  • No visible events.

Tomorrow

  • No visible events.

Tuesday

  • No visible events.

User Details

User Since
Aug 12 2016, 1:45 PM (495 w, 1 d)
Roles
Bot
Availability
Available
LDAP User
Unknown
MediaWiki User
Unknown

Bot managed by SRE for automated interaction with Phabricator from monitoring tools.

Recent Activity

Fri, Feb 6

ops-monitoring-bot created T416736: Degraded RAID on wdqs1028.
Fri, Feb 6, 7:15 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot created T416726: Degraded RAID on kubestage2004.
Fri, Feb 6, 6:04 PM · SRE, ops-codfw, DC-Ops

Thu, Feb 5

ops-monitoring-bot added a comment to T403035: Eqiad: Fr-tech expansion.

Icinga downtime and Alertmanager silence (ID=785b501b-5e53-43b0-b903-5d93372eb8e1) set by cmooney@cumin1003 for 1 day, 0:00:00 on 2 host(s) and their services with reason: fundraising migration eqiad

fasw2-e15a-eqiad,fasw2-e15b-eqiad
Thu, Feb 5, 7:24 PM · fundraising-tech-ops, DC-Ops, Infrastructure-Foundations, ops-eqiad, SRE
ops-monitoring-bot added a comment to T365798: Shutdown of Puppet 5 servers.

cookbooks.sre.hosts.decommission executed by jmm@cumin2002 for hosts: puppetmaster2001.codfw.wmnet

  • puppetmaster2001.codfw.wmnet (PASS)
    • Downtimed host on Icinga/Alertmanager
    • Found physical host
    • Downtimed management interface on Alertmanager
    • Wiped all swraid, partition-table and filesystem signatures
    • Powered off
    • [Netbox] Set status to Decommissioning, deleted all non-mgmt IPs, updated switch interfaces (disabled, removed vlans, etc)
    • Configured the linked switch interface(s)
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
Thu, Feb 5, 3:40 PM · Patch-For-Review, Puppet-Infrastructure, SRE, Infrastructure-Foundations
ops-monitoring-bot added a comment to T403035: Eqiad: Fr-tech expansion.

Icinga downtime and Alertmanager silence (ID=5da72ec9-7626-47d2-bc98-a871f93d717e) set by cmooney@cumin1003 for 1 day, 0:00:00 on 3 host(s) and their services with reason: fundraising migration eqiad

fasw2-c1a-eqiad,fasw2-c1b-eqiad,pfw1-eqiad
Thu, Feb 5, 3:21 PM · fundraising-tech-ops, DC-Ops, Infrastructure-Foundations, ops-eqiad, SRE
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Completed pool of db2204 gradually with 4 steps - After schema change - marostegui@cumin1003

Thu, Feb 5, 1:41 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Start pool of db2204 gradually with 4 steps - After schema change - marostegui@cumin1003

Thu, Feb 5, 12:56 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Completed pool of db2205 gradually with 4 steps - After schema change - marostegui@cumin1003

Thu, Feb 5, 9:41 AM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Start pool of db2205 gradually with 4 steps - After schema change - marostegui@cumin1003

Thu, Feb 5, 8:56 AM · Data-Engineering, Schema-change-in-production, DBA

Wed, Feb 4

ops-monitoring-bot added a comment to T416254: Q3:rack/setup/install bast1004.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host bast1004.wikimedia.org with OS trixie completed:

  • bast1004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602041724_jclark_2648072_bast1004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, Feb 4, 5:41 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T416254: Q3:rack/setup/install bast1004.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host bast1004.wikimedia.org with OS trixie

Wed, Feb 4, 5:03 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Completed pool of db1236 gradually with 4 steps - After schema change - marostegui@cumin1003

Wed, Feb 4, 4:34 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T416254: Q3:rack/setup/install bast1004.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host bast1004.wikimedia.org with OS trixie executed with errors:

  • bast1004 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console bast1004.wikimedia.org" to get a root shell, but depending on the failure this may not work.
Wed, Feb 4, 4:18 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T416254: Q3:rack/setup/install bast1004.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host bast1004.wikimedia.org with OS trixie

Wed, Feb 4, 4:18 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Start pool of db1236 gradually with 4 steps - After schema change - marostegui@cumin1003

Wed, Feb 4, 3:48 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T416254: Q3:rack/setup/install bast1004.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host bast1004.eqiad.wmnet with OS trixie executed with errors:

  • bast1004 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console bast1004.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Wed, Feb 4, 3:34 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T416254: Q3:rack/setup/install bast1004.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host bast1004.eqiad.wmnet with OS trixie

Wed, Feb 4, 3:34 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host ms-fe1022.eqiad.wmnet with OS bullseye completed:

  • ms-fe1022 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602041436_jclark_2543935_ms-fe1022.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 2:54 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host ms-fe1023.eqiad.wmnet with OS bullseye completed:

  • ms-fe1023 (PASS)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602041434_jclark_2543945_ms-fe1023.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 2:53 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1023.eqiad.wmnet with OS bullseye

Wed, Feb 4, 2:14 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1022.eqiad.wmnet with OS bullseye

Wed, Feb 4, 2:14 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host ms-fe1023.eqiad.wmnet with OS bullseye executed with errors:

  • ms-fe1023 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ms-fe1023.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Wed, Feb 4, 2:13 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host ms-fe1022.eqiad.wmnet with OS bullseye executed with errors:

  • ms-fe1022 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ms-fe1022.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Wed, Feb 4, 2:08 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1022.eqiad.wmnet with OS bullseye

Wed, Feb 4, 1:47 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host ms-fe1024.eqiad.wmnet with OS bullseye completed:

  • ms-fe1024 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602041305_jclark_2528735_ms-fe1024.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 1:37 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host ms-fe1021.eqiad.wmnet with OS bullseye completed:

  • ms-fe1021 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602041244_jclark_2527779_ms-fe1021.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 1:12 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1023.eqiad.wmnet with OS bullseye

Wed, Feb 4, 1:00 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host ms-fe1023.eqiad.wmnet with OS bullseye executed with errors:

  • ms-fe1023 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ms-fe1023.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Wed, Feb 4, 1:00 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1024.eqiad.wmnet with OS bullseye

Wed, Feb 4, 12:42 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1023.eqiad.wmnet with OS bullseye

Wed, Feb 4, 12:42 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T416245: Q3:rack/setup/install ms-fe102[14].

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1021.eqiad.wmnet with OS bullseye

Wed, Feb 4, 12:23 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage started by jynus@cumin1003 for host backup1015.eqiad.wmnet with OS trixie completed:

  • backup1015 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602041145_jynus_2518658_backup1015.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 12:07 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage was started by jynus@cumin1003 for host backup1015.eqiad.wmnet with OS trixie

Wed, Feb 4, 11:27 AM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-worker1006.eqiad.wmnet with OS trixie completed:

  • tools-k8s-worker1006 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040110_jclark_2437768_tools-k8s-worker1006.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 1:27 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-worker1008.eqiad.wmnet with OS trixie completed:

  • tools-k8s-worker1008 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040107_jclark_2438571_tools-k8s-worker1008.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 1:24 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-worker1007.eqiad.wmnet with OS trixie completed:

  • tools-k8s-worker1007 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040103_jclark_2437842_tools-k8s-worker1007.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 1:21 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-worker1005.eqiad.wmnet with OS trixie completed:

  • tools-k8s-worker1005 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040059_jclark_2437713_tools-k8s-worker1005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Wed, Feb 4, 1:15 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-worker1008.eqiad.wmnet with OS trixie

Wed, Feb 4, 12:50 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-worker1004.eqiad.wmnet with OS trixie completed:

  • tools-k8s-worker1004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040033_jclark_2422713_tools-k8s-worker1004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, Feb 4, 12:49 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-worker1007.eqiad.wmnet with OS trixie

Wed, Feb 4, 12:46 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-worker1003.eqiad.wmnet with OS trixie completed:

  • tools-k8s-worker1003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040029_jclark_2422762_tools-k8s-worker1003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, Feb 4, 12:45 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-worker1006.eqiad.wmnet with OS trixie

Wed, Feb 4, 12:45 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-worker1005.eqiad.wmnet with OS trixie

Wed, Feb 4, 12:44 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-worker1002.eqiad.wmnet with OS trixie completed:

  • tools-k8s-worker1002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040025_jclark_2422203_tools-k8s-worker1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, Feb 4, 12:42 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-worker1001.eqiad.wmnet with OS trixie completed:

  • tools-k8s-worker1001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040021_jclark_2422123_tools-k8s-worker1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, Feb 4, 12:37 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-ctrl1001.eqiad.wmnet with OS trixie completed:

  • tools-k8s-ctrl1001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040017_jclark_2421485_tools-k8s-ctrl1001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, Feb 4, 12:33 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host tools-k8s-ctrl1002.eqiad.wmnet with OS trixie completed:

  • tools-k8s-ctrl1002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602040013_jclark_2421475_tools-k8s-ctrl1002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Wed, Feb 4, 12:30 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-worker1003.eqiad.wmnet with OS trixie

Wed, Feb 4, 12:11 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-worker1004.eqiad.wmnet with OS trixie

Wed, Feb 4, 12:11 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-worker1002.eqiad.wmnet with OS trixie

Wed, Feb 4, 12:05 AM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-worker1001.eqiad.wmnet with OS trixie

Wed, Feb 4, 12:05 AM · SRE, DC-Ops, ops-eqiad

Tue, Feb 3

ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-ctrl1002.eqiad.wmnet with OS trixie

Tue, Feb 3, 11:57 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T410403: Q2:rack/setup/install Toolforge.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host tools-k8s-ctrl1001.eqiad.wmnet with OS trixie

Tue, Feb 3, 11:57 PM · SRE, DC-Ops, ops-eqiad
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host backup1015.eqiad.wmnet with OS bookworm executed with errors:

  • backup1015 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console backup1015.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Tue, Feb 3, 11:40 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1003 for host ms-fe1024.eqiad.wmnet with OS bullseye executed with errors:

  • ms-fe1024 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ms-fe1024.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Tue, Feb 3, 8:30 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host backup1015.eqiad.wmnet with OS bookworm

Tue, Feb 3, 7:59 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1003 for host ms-fe1024.eqiad.wmnet with OS bullseye

Tue, Feb 3, 7:40 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1003 for host ms-fe1024.eqiad.wmnet with OS bullseye executed with errors:

  • ms-fe1024 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ms-fe1024.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Tue, Feb 3, 7:38 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host ms-fe1021.eqiad.wmnet with OS bullseye executed with errors:

  • ms-fe1021 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console ms-fe1021.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Tue, Feb 3, 7:23 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1003 for host ms-fe1024.eqiad.wmnet with OS bullseye

Tue, Feb 3, 7:23 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host backup1015.eqiad.wmnet with OS bookworm executed with errors:

  • backup1015 (FAIL)
    • Host successfully migrated to the new VLAN
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console backup1015.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.
Tue, Feb 3, 7:10 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1023.eqiad.wmnet with OS bullseye

Tue, Feb 3, 7:04 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host ms-fe1021.eqiad.wmnet with OS bullseye

Tue, Feb 3, 6:56 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T414725: Q3:rack/setup/install backup1015.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host backup1015.eqiad.wmnet with OS bookworm

Tue, Feb 3, 5:45 PM · Infrastructure-Foundations, Data-Persistence, SRE, ops-eqiad, DC-Ops

Mon, Feb 2

ops-monitoring-bot created T416268: Degraded RAID on an-worker1187.
Mon, Feb 2, 11:20 PM · SRE, ops-eqiad, DC-Ops
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Completed pooling of db1193 by marostegui@cumin1003: After schema change

Mon, Feb 2, 12:54 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Starting pool of db1193 by marostegui@cumin1003: After schema change

Mon, Feb 2, 12:54 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Completed pooling of db1193 by marostegui@cumin1003: After schema change

Mon, Feb 2, 12:46 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Completed pooling of db1222 by marostegui@cumin1003: After schema change

Mon, Feb 2, 12:45 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Starting pool of db1193 by marostegui@cumin1003: After schema change

Mon, Feb 2, 12:00 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Starting pool of db1222 by marostegui@cumin1003: After schema change

Mon, Feb 2, 12:00 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Starting pool of db1222 by marostegui@cumin1003: After schema change

Mon, Feb 2, 11:58 AM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Completed pooling of db2249 by marostegui@cumin1003: After reimage

Mon, Feb 2, 10:36 AM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Starting pool of db2249 by marostegui@cumin1003: After reimage

Mon, Feb 2, 9:51 AM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2249.codfw.wmnet with OS trixie completed:

  • db2249 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602020924_marostegui_898222_db2249.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Mon, Feb 2, 9:47 AM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2249.codfw.wmnet with OS trixie

Mon, Feb 2, 9:07 AM · DBA

Sat, Jan 31

ops-monitoring-bot created T416066: Degraded RAID on an-worker1199.
Sat, Jan 31, 1:34 AM · DC-Ops, SRE, ops-eqiad

Fri, Jan 30

ops-monitoring-bot added a comment to T415045: Prepare and check storage layer for pplwiki.

Section s5: Wikis pplwiki set up on clouddb - marostegui@cumin1003

Fri, Jan 30, 5:52 AM · DBA
ops-monitoring-bot added a comment to T415045: Prepare and check storage layer for pplwiki.

Section s5: Wikis pplwiki redacted - marostegui@cumin1003

Fri, Jan 30, 5:45 AM · DBA

Thu, Jan 29

ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Completed pooling of db1201 by marostegui@cumin1003: After schema change

Thu, Jan 29, 3:56 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Completed pooling of db1210 by marostegui@cumin1003: After schema change

Thu, Jan 29, 3:55 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Starting pool of db1201 by marostegui@cumin1003: After schema change

Thu, Jan 29, 3:10 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T415786: Update imagelinks primary key on wmf production.

Starting pool of db1210 by marostegui@cumin1003: After schema change

Thu, Jan 29, 3:10 PM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T411164: Drop rev_sha1 from revision table in wmf production.

Completed pooling of db2212 by marostegui@cumin1003: After schema change

Thu, Jan 29, 7:21 AM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T411164: Drop rev_sha1 from revision table in wmf production.

Starting pool of db2212 by marostegui@cumin1003: After schema change

Thu, Jan 29, 6:35 AM · Data-Engineering, Schema-change-in-production, DBA

Wed, Jan 28

ops-monitoring-bot added a comment to T411164: Drop rev_sha1 from revision table in wmf production.

Completed pooling of db1163 by marostegui@cumin1003: After schema change

Wed, Jan 28, 7:26 AM · Data-Engineering, Schema-change-in-production, DBA
ops-monitoring-bot added a comment to T411164: Drop rev_sha1 from revision table in wmf production.

Starting pool of db1163 by marostegui@cumin1003: After schema change

Wed, Jan 28, 6:40 AM · Data-Engineering, Schema-change-in-production, DBA

Tue, Jan 27

ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Completed pooling of db2248 by marostegui@cumin1003: After reimage

Tue, Jan 27, 9:05 AM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Starting pool of db2248 by marostegui@cumin1003: After reimage

Tue, Jan 27, 8:19 AM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db2248.codfw.wmnet with OS trixie completed:

  • db2248 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601270739_marostegui_3854760_db2248.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Tue, Jan 27, 8:01 AM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Cookbook cookbooks.sre.hosts.reimage was started by marostegui@cumin1003 for host db2248.codfw.wmnet with OS trixie

Tue, Jan 27, 7:20 AM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Completed depooling of db2248 by marostegui@cumin1003: Reimage

Tue, Jan 27, 7:18 AM · DBA

Mon, Jan 26

ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Finished cloning db1224.eqiad.wmnet to db1264.eqiad.wmnet - marostegui@cumin1003

Mon, Jan 26, 6:57 PM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Completed pool of db1264 gradually with 4 steps - Pool db1264.eqiad.wmnet in after cloning - marostegui@cumin1003

Mon, Jan 26, 6:57 PM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Start pool of db1264 gradually with 4 steps - Pool db1264.eqiad.wmnet in after cloning - marostegui@cumin1003

Mon, Jan 26, 6:11 PM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Completed pool of db1224 gradually with 4 steps - Pool db1224.eqiad.wmnet in after cloning - marostegui@cumin1003

Mon, Jan 26, 4:58 PM · DBA
ops-monitoring-bot added a comment to T415031: Prepare and check storage layer for kajwiki.

Section s5: Wikis kajwiki set up on clouddb - marostegui@cumin1003

Mon, Jan 26, 4:41 PM · DBA
ops-monitoring-bot added a comment to T415031: Prepare and check storage layer for kajwiki.

Section s5: Wikis kajwiki redacted - marostegui@cumin1003

Mon, Jan 26, 4:35 PM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Start pool of db1224 gradually with 4 steps - Pool db1224.eqiad.wmnet in after cloning - marostegui@cumin1003

Mon, Jan 26, 4:12 PM · DBA
ops-monitoring-bot added a comment to T415358: Migrate 1P db* to Debian Trixie.

Cookbook cookbooks.sre.hosts.reimage started by marostegui@cumin1003 for host db1264.eqiad.wmnet with OS trixie completed:

  • db1264 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601260908_marostegui_3710388_db1264.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
Mon, Jan 26, 9:28 AM · DBA