Page MenuHomePhabricator

Replace pybal with liberica on the PoPs
Closed, ResolvedPublic

Description

replace pybal with liberica/ipvs on the PoPs.

Current status:

  • esams
  • ulsfo
  • eqsin
  • drmrs
  • magru

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+6 -0
operations/puppetproduction+0 -64
operations/puppetproduction+1 -1
operations/puppetproduction+0 -3
operations/puppetproduction+9 -6
operations/puppetproduction+0 -2
operations/puppetproduction+10 -2
operations/puppetproduction+21 -2
operations/puppetproduction+1 -1
operations/puppetproduction+0 -4
operations/puppetproduction+10 -5
operations/puppetproduction+0 -2
operations/puppetproduction+10 -2
operations/puppetproduction+2 -2
operations/puppetproduction+23 -2
operations/puppetproduction+0 -1
operations/puppetproduction+0 -3
operations/puppetproduction+9 -5
operations/puppetproduction+0 -4
operations/puppetproduction+9 -2
operations/puppetproduction+1 -1
operations/puppetproduction+22 -2
operations/puppetproduction+0 -1
operations/puppetproduction+0 -3
operations/puppetproduction+9 -5
operations/puppetproduction+0 -3
operations/puppetproduction+11 -2
operations/puppetproduction+1 -1
operations/puppetproduction+21 -2
operations/puppetproduction+7 -4
operations/puppetproduction+0 -3
operations/puppetproduction+4 -0
operations/puppetproduction+9 -5
operations/puppetproduction+7 -0
operations/puppetproduction+66 -25
operations/puppetproduction+0 -3
operations/puppetproduction+4 -0
operations/puppetproduction+8 -2
operations/puppetproduction+4 -0
operations/puppetproduction+1 -1
operations/puppetproduction+18 -2
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2025-03-04T14:16:48Z] <vgutierrez> depooling lvs5004 before reimaging - T384477

Icinga downtime and Alertmanager silence (ID=3c84753d-9da7-4512-8291-9b672fc8b298) set by vgutierrez@cumin1002 for 0:30:00 on 1 host(s) and their services with reason: depooled before reimage

lvs5004.eqsin.wmnet

Change #1124407 merged by Vgutierrez:

[operations/puppet@production] hiera,site: Reimage lvs5004 as liberica

https://gerrit.wikimedia.org/r/1124407

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs5004.eqsin.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs5004.eqsin.wmnet with OS bookworm completed:

  • lvs5004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202503041453_vgutierrez_2527396_lvs5004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1124459 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Restore lvs5004 BGP priority

https://gerrit.wikimedia.org/r/1124459

Change #1124459 merged by Vgutierrez:

[operations/puppet@production] hiera: Restore lvs5004 BGP priority

https://gerrit.wikimedia.org/r/1124459

Mentioned in SAL (#wikimedia-operations) [2025-03-04T15:41:55Z] <vgutierrez> repooling lvs5004 running liberica - T384477

Change #1125162 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cumin: Remove lvs-eqsin alias

https://gerrit.wikimedia.org/r/1125162

Change #1125472 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] site,hiera: Reimage lvs6003 as liberica

https://gerrit.wikimedia.org/r/1125472

Change #1125162 merged by Vgutierrez:

[operations/puppet@production] cumin: Remove lvs-eqsin alias

https://gerrit.wikimedia.org/r/1125162

Mentioned in SAL (#wikimedia-operations) [2025-03-12T11:18:27Z] <vgutierrez> reimage lvs6003 as a liberica instance - T384477

Change #1125472 merged by Vgutierrez:

[operations/puppet@production] site,hiera: Reimage lvs6003 as liberica

https://gerrit.wikimedia.org/r/1125472

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs6003.drmrs.wmnet with OS bookworm

Change #1126972 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Fix NIC names for liberica@drmrs

https://gerrit.wikimedia.org/r/1126972

Change #1126972 merged by Vgutierrez:

[operations/puppet@production] hiera: Fix NIC names for liberica@drmrs

https://gerrit.wikimedia.org/r/1126972

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs6003.drmrs.wmnet with OS bookworm completed:

  • lvs6003 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202503121142_vgutierrez_3451400_lvs6003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1126974 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] site,hiera: Reimage lvs6002 as liberica

https://gerrit.wikimedia.org/r/1126974

Mentioned in SAL (#wikimedia-operations) [2025-03-12T14:26:08Z] <vgutierrez> depooling lvs6002 before getting reimaged - T384477

Icinga downtime and Alertmanager silence (ID=6160b7b2-7281-4c01-a4ad-0c0ebed8103d) set by vgutierrez@cumin1002 for 0:30:00 on 1 host(s) and their services with reason: depooled before reimage

lvs6002.drmrs.wmnet

Change #1126974 merged by Vgutierrez:

[operations/puppet@production] site,hiera: Reimage lvs6002 as liberica

https://gerrit.wikimedia.org/r/1126974

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs6002.drmrs.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs6002.drmrs.wmnet with OS bookworm completed:

  • lvs6002 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202503121453_vgutierrez_3555122_lvs6002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1127053 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Restore lvs6002 BGP priority

https://gerrit.wikimedia.org/r/1127053

Change #1127053 merged by Vgutierrez:

[operations/puppet@production] hiera: Restore lvs6002 BGP priority

https://gerrit.wikimedia.org/r/1127053

Change #1127062 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] site,hiera: Reimage lvs6001 as liberica

https://gerrit.wikimedia.org/r/1127062

Mentioned in SAL (#wikimedia-operations) [2025-03-12T16:00:32Z] <vgutierrez@cumin1002> START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6002.drmrs.wmnet} and A:liberica (T384477)

Mentioned in SAL (#wikimedia-operations) [2025-03-12T16:00:50Z] <vgutierrez@cumin1002> END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6002.drmrs.wmnet} and A:liberica (T384477)

Mentioned in SAL (#wikimedia-operations) [2025-03-13T07:41:51Z] <vgutierrez> depool lvs6001 before being reimaged - T384477

Icinga downtime and Alertmanager silence (ID=2d81a5cc-8423-4910-ad45-d18cdfacb12e) set by vgutierrez@cumin1002 for 0:30:00 on 1 host(s) and their services with reason: depooled before reimage

lvs6001.drmrs.wmnet

Change #1127062 merged by Vgutierrez:

[operations/puppet@production] site,hiera: Reimage lvs6001 as liberica

https://gerrit.wikimedia.org/r/1127062

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs6001.drmrs.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs6001.drmrs.wmnet with OS bookworm executed with errors:

  • lvs6001 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202503130810_vgutierrez_3737799_lvs6001.out
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console lvs6001.drmrs.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs6001.drmrs.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs6001.drmrs.wmnet with OS bookworm completed:

  • lvs6001 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202503130851_vgutierrez_3750779_lvs6001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1127464 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Restore lvs6001 BGP priority

https://gerrit.wikimedia.org/r/1127464

Change #1127464 merged by Vgutierrez:

[operations/puppet@production] hiera: Restore lvs6001 BGP priority

https://gerrit.wikimedia.org/r/1127464

Mentioned in SAL (#wikimedia-operations) [2025-03-13T09:37:20Z] <vgutierrez@cumin1002> START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6001.drmrs.wmnet} and A:liberica (T384477)

Mentioned in SAL (#wikimedia-operations) [2025-03-13T09:37:38Z] <vgutierrez@cumin1002> END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6001.drmrs.wmnet} and A:liberica (T384477)

Change #1127471 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cumin: Update (liberica|lvs)-drmrs aliases

https://gerrit.wikimedia.org/r/1127471

Change #1127471 merged by Vgutierrez:

[operations/puppet@production] cumin: Update (liberica|lvs)-drmrs aliases

https://gerrit.wikimedia.org/r/1127471

Change #1127853 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] site,hiera: Reimage lvs3010 as liberica

https://gerrit.wikimedia.org/r/1127853

Change #1127853 merged by Vgutierrez:

[operations/puppet@production] site,hiera: Reimage lvs3010 as liberica

https://gerrit.wikimedia.org/r/1127853

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs3010.esams.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs3010.esams.wmnet with OS bookworm executed with errors:

  • lvs3010 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console lvs3010.esams.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs3010.esams.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs3010.esams.wmnet with OS bookworm completed:

  • lvs3010 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202503171102_vgutierrez_454175_lvs3010.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1128382 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] site,hiera: Reimage lvs3009 as liberica

https://gerrit.wikimedia.org/r/1128382

Icinga downtime and Alertmanager silence (ID=00120896-d6ec-4aac-9b71-59479cad308d) set by vgutierrez@cumin1002 for 0:30:00 on 1 host(s) and their services with reason: depooled before reimage

lvs3009.esams.wmnet

Mentioned in SAL (#wikimedia-operations) [2025-03-17T13:08:06Z] <vgutierrez> depooling lvs3009 before being reimaged - T384477

Change #1128382 merged by Vgutierrez:

[operations/puppet@production] site,hiera: Reimage lvs3009 as liberica

https://gerrit.wikimedia.org/r/1128382

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs3009.esams.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs3009.esams.wmnet with OS bookworm completed:

  • lvs3009 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202503171341_vgutierrez_478372_lvs3009.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1128418 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Restore lvs3009 BGP priority

https://gerrit.wikimedia.org/r/1128418

Change #1128418 merged by Vgutierrez:

[operations/puppet@production] hiera: Restore lvs3009 BGP priority

https://gerrit.wikimedia.org/r/1128418

Mentioned in SAL (#wikimedia-operations) [2025-03-17T14:09:51Z] <vgutierrez> repool lvs3009 running liberica - T384477

Mentioned in SAL (#wikimedia-operations) [2025-03-17T14:09:58Z] <vgutierrez@cumin1002> START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3009.esams.wmnet} and A:liberica (T384477)

Mentioned in SAL (#wikimedia-operations) [2025-03-17T14:10:16Z] <vgutierrez@cumin1002> END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3009.esams.wmnet} and A:liberica (T384477)

Change #1128421 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] site,hiera: Reimage lvs3008 as liberica

https://gerrit.wikimedia.org/r/1128421

Change #1128421 merged by Vgutierrez:

[operations/puppet@production] site,hiera: Reimage lvs3008 as liberica

https://gerrit.wikimedia.org/r/1128421

Mentioned in SAL (#wikimedia-operations) [2025-03-17T14:22:42Z] <vgutierrez> depooling lvs3008 before being reimaged - T384477

Icinga downtime and Alertmanager silence (ID=84eaa5ca-ad49-419d-9f2f-eb1dda5bf75d) set by vgutierrez@cumin1002 for 0:30:00 on 1 host(s) and their services with reason: depooled before reimage

lvs3008.esams.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by vgutierrez@cumin1002 for host lvs3008.esams.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by vgutierrez@cumin1002 for host lvs3008.esams.wmnet with OS bookworm completed:

  • lvs3008 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202503171452_vgutierrez_491862_lvs3008.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change #1128446 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Restore BGP priority for lvs3008

https://gerrit.wikimedia.org/r/1128446

Change #1128446 merged by Vgutierrez:

[operations/puppet@production] hiera: Restore BGP priority for lvs3008

https://gerrit.wikimedia.org/r/1128446

Change #1128448 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cumin: Update (liberica|lvs)-esams aliases

https://gerrit.wikimedia.org/r/1128448

Change #1128449 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Clean-up lvs::balancer keys for non-core DCs

https://gerrit.wikimedia.org/r/1128449

Change #1128448 merged by Vgutierrez:

[operations/puppet@production] cumin: Update (liberica|lvs)-esams aliases

https://gerrit.wikimedia.org/r/1128448

Mentioned in SAL (#wikimedia-operations) [2025-03-17T15:31:01Z] <vgutierrez> repool lvs3008 running liberica - T384477

Mentioned in SAL (#wikimedia-operations) [2025-03-17T15:31:10Z] <vgutierrez@cumin1002> START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3008.esams.wmnet} and A:liberica (T384477)

Mentioned in SAL (#wikimedia-operations) [2025-03-17T15:31:28Z] <vgutierrez@cumin1002> END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3008.esams.wmnet} and A:liberica (T384477)

Change #1128452 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hieradata: Use codfw etcd cluster in liberica@(ulsfo|eqsin)

https://gerrit.wikimedia.org/r/1128452

Change #1128449 merged by Vgutierrez:

[operations/puppet@production] hiera: Clean-up lvs::balancer keys for non-core DCs

https://gerrit.wikimedia.org/r/1128449

Change #1128452 merged by Vgutierrez:

[operations/puppet@production] hieradata: Use codfw etcd cluster in liberica@(ulsfo|eqsin)

https://gerrit.wikimedia.org/r/1128452

@Vgutierrez I hit on a small discrepancy in Netbox, I think we just need to clean it up but wanted to check.

This port on asw1-b13-drmrs had the cable on port et-0/0/17 removed, however on the actual switch the port is still enabled (port config is still enabled in Netbox) and LLDP shows it's still connected to lvs6003.

cmooney@asw1-b13-drmrs> show interfaces terse | match et-0/0/17 
et-0/0/17               up    up
et-0/0/17.0             up    up   eth-switch
{master:0}
cmooney@asw1-b13-drmrs> show lldp neighbors interface et-0/0/17 | match "Address           :"    
        Address           : e4:3d:1a:71:b5:71
cmooney@lvs6003:~$ ip -br link show | grep "e4:3d:1a:71:b5:71"
ens3f1np1        DOWN           e4:3d:1a:71:b5:71 <BROADCAST,MULTICAST>

Have you been in touch with dc-ops about removing this cable on site? If not what I can do is re-add the cable to keep records up to date, but also disable this switch port as the lvs side has it disabled. Otherwise we need to work with dc-ops and remote hands to get it removed on site to match Netbox, after which we can disable the port too.

Have you been in touch with dc-ops about removing this cable on site?

Nope, I haven't performed any action that would lead to physical changes in any POP related to this task.

Have you been in touch with dc-ops about removing this cable on site?

Nope, I haven't performed any action that would lead to physical changes in any POP related to this task.

No stress. I've tidied it up now in Netbox, adding a cable to reflect the fact it's still connected on site, but removing the switch port configuration. Let's deal with removing the cable in T367731.