Page MenuHomePhabricator

Move WMCS servers to 1 single NIC
Open, MediumPublic

Description

This ticket tracks work to move WMCS servers to a 1 NIC setup.

codfw

eqiad

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+1 -5
operations/puppetproduction+1 -5
operations/puppetproduction+1 -5
operations/puppetproduction+1 -5
operations/puppetproduction+1 -5
operations/puppetproduction+1 -5
operations/puppetproduction+1 -5
operations/puppetproduction+1 -5
operations/puppetproduction+0 -5
operations/puppetproduction+192 -33
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+0 -5
operations/puppetproduction+5 -6
operations/puppetproduction+139 -20
operations/puppetproduction+4 -65
operations/puppetproduction+7 -22
operations/puppetproduction+4 -6
operations/puppetproduction+58 -14
operations/puppetproduction+3 -3
operations/puppetproduction+3 -3
operations/puppetproduction+1 -1
operations/puppetproduction+3 -2
operations/puppetproduction+0 -7
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is the patch to enable the single NIC setup on ceph nodes:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/856675/

Is marked as abandoned, but should work just fine.

aborrero changed the task status from Stalled to Open.Feb 14 2024, 3:30 PM
aborrero added a project: User-aborrero.

In a 2024-02-14 network sync meeting we decided to continue moving older cloudvirts into the new single NIC setup. I plan to work on this soon.

aborrero updated the task description. (Show Details)

Change 1003616 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1031: move to modern NIC setup

https://gerrit.wikimedia.org/r/1003616

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-15T12:05:25Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1031.eqiad.wmnet' (T319184)

Change 1003616 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1031: move to modern NIC setup

https://gerrit.wikimedia.org/r/1003616

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1031.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1031.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1031 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402151232_aborrero_124213_cloudvirt1031.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T09:49:01Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.pre-reimage prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T09:49:07Z] <aborrero@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.pre-reimage (exit_code=99) prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T09:52:10Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.pre-reimage prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T10:09:12Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.pre-reimage (exit_code=0) prepare cloudvirt1032.eqiad.wmnet for reimage (drain, remove nova agent, etc) (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1032.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T12:00:50Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T12:00:56Z] <aborrero@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T12:02:28Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T12:02:32Z] <aborrero@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1032.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1032 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402191134_aborrero_830992_cloudvirt1032.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T12:32:49Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-19T12:33:11Z] <aborrero@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=99) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-20T11:30:00Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.post-reimage preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-20T11:30:28Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.post-reimage (exit_code=0) preparing cloudvirt cloudvirt1032.eqiad.wmnet for duty (nova discovery, canary VM) Pending aggregates though. (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-21T11:44:22Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1033.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-21T12:03:14Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1033.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1033.eqiad.wmnet with OS bookworm

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-21T12:50:30Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1033.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-21T12:50:56Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1033.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1033.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1033 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402211224_aborrero_1262466_cloudvirt1033.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-21T13:19:54Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-21T13:20:05Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.unset_maintenance (exit_code=0) (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-21T13:43:44Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.set_maintenance (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-21T13:44:15Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.set_maintenance (exit_code=0) (T319184)

Change 1005513 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1033: move to single NIC setup

https://gerrit.wikimedia.org/r/1005513

Change 1005513 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1033: move to single NIC setup

https://gerrit.wikimedia.org/r/1005513

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1033.eqiad.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1033.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1033 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402211410_aborrero_1277933_cloudvirt1033.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-22T12:32:37Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1034.eqiad.wmnet' (T319184)

Change 1005750 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1034: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1005750

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-22T12:53:12Z] <aborrero@cloudcumin1001> END (FAIL) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=99) on host 'cloudvirt1034.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-22T12:57:57Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1034.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-02-22T12:58:41Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1034.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1034.eqiad.wmnet with OS bookworm

Change 1005750 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1034: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1005750

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1034.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1034 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202402221320_aborrero_1465831_cloudvirt1034.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1016318 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1035: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016318

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-02T11:32:41Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1035.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-02T11:45:34Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1035.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1035.eqiad.wmnet with OS bookworm

Change #1016318 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1035: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016318

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1035.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1035 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404021209_aborrero_254247_cloudvirt1035.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change #1016363 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1036: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016363

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-02T14:43:30Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1036.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-02T15:00:13Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1036.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1036.eqiad.wmnet with OS bookworm

Change #1016363 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1036: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016363

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1036.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1036 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404021551_aborrero_294759_cloudvirt1036.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-03T09:09:41Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1037.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-03T09:26:28Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1037.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1037.eqiad.wmnet with OS bookworm

Change #1016720 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1037: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016720

Change #1016720 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1037: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016720

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1037.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1037 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404030948_aborrero_441921_cloudvirt1037.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-03T10:25:36Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1038.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-03T10:37:06Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1038.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1038.eqiad.wmnet with OS bookworm

Change #1016743 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1037: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016743

Change #1016743 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1037: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016743

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1038.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1038 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404031107_aborrero_454180_cloudvirt1038.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-03T11:34:24Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1039.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-03T11:48:20Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1039.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1039.eqiad.wmnet with OS bookworm

Change #1016751 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1039: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016751

Change #1016751 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1039: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016751

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1039.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1039 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404031211_aborrero_464637_cloudvirt1039.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-03T14:09:19Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.openstack.cloudvirt.drain on host 'cloudvirt1040.eqiad.wmnet' (T319184)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-04-03T14:16:11Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.cloudvirt.drain (exit_code=0) on host 'cloudvirt1040.eqiad.wmnet' (T319184)

Cookbook cookbooks.sre.hosts.reimage was started by aborrero@cumin1002 for host cloudvirt1040.eqiad.wmnet with OS bookworm

Change #1016795 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] cloudvirt1040: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016795

Change #1016795 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] cloudvirt1040: move to modern single NIC setup

https://gerrit.wikimedia.org/r/1016795

Cookbook cookbooks.sre.hosts.reimage started by aborrero@cumin1002 for host cloudvirt1040.eqiad.wmnet with OS bookworm completed:

  • cloudvirt1040 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202404031434_aborrero_485212_cloudvirt1040.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB