Page MenuHomePhabricator

Q4:rack/setup/install lvs2011, lvs2012, lvs2013, lvs2014
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of lvs2011, lvs 2012, lvs2013, lvs2014

Hostname / Racking / Installation Details

Hostnames: lvs2011, lvs 2012, lvs2013, lvs2014
Racking Proposal: Same locations as existing: lvs2011 where lvs2007 is, etc...
Networking Setup: Custom LVS stuff
Partitioning/Raid: standard/raid1-2dev
OS Distro: Bullseye
Sub-team Technical Contact: @BBlack

Since these need to be connected to every row, it may be easier to roll this out by racking each new lvs adjacent to existing, installing them with the OS and single network connection to their own switch, then coordinate with Traffic to decommission the row A lvs and move its connections to the new row A lvs. As this would require a level of coordination between ops-codfw on-site and Traffic, it is up to ops-codfw and Traffic if they prefer to leverage re-use of the existing cross-row connections (staged migration to new servers) or run new ones (so all new hosts fully online before any old lvs hosts are offlined.)

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

lvs2011: RACK: A2-U43
  • - receive in system on procurement task T325233 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::traffic
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
lvs2012: RACK: B2-U43
  • - receive in system on procurement task T325233 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::traffic
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
lvs2013: RACK: C2-U44
  • - receive in system on procurement task T325233 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::traffic
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
lvs2014: RACK: D2-U44
  • - receive in system on procurement task T325233 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and role::insetup::traffic
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Related Objects

StatusSubtypeAssignedTask
ResolvedPapaul

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2023-05-08T16:11:39Z] <sukhe@deploy1002> Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2011.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2011.codfw.wmnet with OS bullseye executed with errors:

  • lvs2011 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2011.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2011.codfw.wmnet with OS bullseye executed with errors:

  • lvs2011 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2011.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2011.codfw.wmnet with OS bullseye executed with errors:

  • lvs2011 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2011.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2011.codfw.wmnet with OS bullseye completed:

  • lvs2011 (WARN)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305081658_sukhe_1270705_lvs2011.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB

Change 914871 merged by Ssingh:

[operations/homer/public@master] sites.yaml: add new LVS host lvs2011 (codfw hardware refresh)

https://gerrit.wikimedia.org/r/914871

Mentioned in SAL (#wikimedia-operations) [2023-05-08T17:39:06Z] <sukhe> homer "cr*-codfw*" commit "Gerrit: 914871 add new LVS host lvs2011": T326767

Change 917386 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove BGP MED override for lvs2011

https://gerrit.wikimedia.org/r/917386

Change 917386 merged by Ssingh:

[operations/puppet@production] hiera: remove BGP MED override for lvs2011

https://gerrit.wikimedia.org/r/917386

Mentioned in SAL (#wikimedia-operations) [2023-05-08T17:48:15Z] <sukhe> restart pybal on lvs2011 to pick up bgp med change: T326767

Mentioned in SAL (#wikimedia-operations) [2023-05-08T17:51:44Z] <sukhe> set routing-options static route 208.80.153.224/28 [high-traffic1, codfw] next-hop 10.192.0.29: T326767

Mentioned in SAL (#wikimedia-operations) [2023-05-08T18:04:43Z] <sukhe@deploy1002> Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 113m 03s)

Mentioned in SAL (#wikimedia-operations) [2023-05-09T14:08:46Z] <sukhe@deploy1002> Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767

Mentioned in SAL (#wikimedia-operations) [2023-05-09T14:54:32Z] <sukhe@deploy1002> Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 45m 45s)

@Papaul
onboard port 1 cable ID: 12110
onboard port 2 cable ID: 12109
NIC port 1 cable ID: 12108
NIC port 2 cable ID: 12174

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Change 917922 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] lvs2012: commission new LVS host (codfw hardware refresh)

https://gerrit.wikimedia.org/r/917922

Change 917924 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] sites.yaml: add new LVS host lvs2012 (codfw hardware refresh)

https://gerrit.wikimedia.org/r/917924

Change 917926 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove BGP MED override for lvs2012

https://gerrit.wikimedia.org/r/917926

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye completed:

  • lvs2012 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305091657_pt1979_2945216_lvs2012.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Mentioned in SAL (#wikimedia-operations) [2023-05-10T15:33:46Z] <sukhe@deploy1002> Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767

Change 917922 merged by Ssingh:

[operations/puppet@production] lvs2012: commission new LVS host (codfw hardware refresh)

https://gerrit.wikimedia.org/r/917922

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305101606_sukhe_76134_lvs2012.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305101715_sukhe_88593_lvs2012.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • The reimage failed, see the cookbook logs for the details

Mentioned in SAL (#wikimedia-operations) [2023-05-10T18:45:40Z] <sukhe@deploy1002> Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 191m 53s)

Mentioned in SAL (#wikimedia-operations) [2023-05-12T15:01:20Z] <sukhe@deploy1002> Locking from deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye executed with errors:

  • lvs2012 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2012.codfw.wmnet with OS bullseye completed:

  • lvs2012 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202305121630_sukhe_592262_lvs2012.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 917924 merged by Ssingh:

[operations/homer/public@master] sites.yaml: add new LVS host lvs2012 (codfw hardware refresh)

https://gerrit.wikimedia.org/r/917924

Mentioned in SAL (#wikimedia-operations) [2023-05-12T17:11:50Z] <sukhe> homer "cr*-codfw*" commit "Gerrit: 917924 add new LVS host lvs2012": T326767

Change 917926 merged by Ssingh:

[operations/puppet@production] hiera: remove BGP MED override for lvs2012

https://gerrit.wikimedia.org/r/917926

Mentioned in SAL (#wikimedia-operations) [2023-05-12T17:21:45Z] <sukhe> restart pybal on lvs2012 to pick up bgp med change: T326767

Mentioned in SAL (#wikimedia-operations) [2023-05-12T17:27:43Z] <sukhe> set routing-options static route 208.80.153.240/28 [high-traffic2, codfw] next-hop 10.192.16.140: T326767

Mentioned in SAL (#wikimedia-operations) [2023-05-12T17:31:55Z] <sukhe@deploy1002> Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in codfw, blocking deploys T326767 (duration: 150m 34s)

Mentioned in SAL (#wikimedia-operations) [2023-06-05T13:36:02Z] <sukhe@deploy1002> Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T326767

Mentioned in SAL (#wikimedia-operations) [2023-06-05T15:18:49Z] <sukhe@deploy1002> Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T326767 (duration: 102m 46s)

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host lvs2013.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host lvs2013.codfw.wmnet with OS bullseye completed:

  • lvs2013 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306051620_pt1979_778698_lvs2013.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Change 927678 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] lvs2013: commission new LVS host (codfw hardware refresh)

https://gerrit.wikimedia.org/r/927678

Change 927678 merged by Ssingh:

[operations/puppet@production] lvs2013: commission new LVS host (codfw hardware refresh)

https://gerrit.wikimedia.org/r/927678

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2013.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2013.codfw.wmnet with OS bullseye completed:

  • lvs2013 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306061435_sukhe_2186539_lvs2013.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 927725 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] sites.yaml: add new LVS host lvs2013 (codfw hardware refresh)

https://gerrit.wikimedia.org/r/927725

Change 927725 merged by Ssingh:

[operations/homer/public@master] sites.yaml: add new LVS host lvs2013 (codfw hardware refresh)

https://gerrit.wikimedia.org/r/927725

Mentioned in SAL (#wikimedia-operations) [2023-06-06T15:26:56Z] <sukhe> homer "cr*-codfw*" commit "Gerrit: 927725 add new LVS host lvs2013" : T326767

Change 927735 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: remove lvs2013's bgp-med override

https://gerrit.wikimedia.org/r/927735

Change 927735 merged by Ssingh:

[operations/puppet@production] hiera: remove lvs2013's bgp-med override

https://gerrit.wikimedia.org/r/927735

Change 928112 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] lvs2014: commission new LVS host (codfw hardware refresh)

https://gerrit.wikimedia.org/r/928112

Change 928113 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/homer/public@master] sites.yaml: add new LVS host lvs2014 (codfw hardware refresh)

https://gerrit.wikimedia.org/r/928113

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye executed with errors:

  • lvs2014 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

@Papaul

cable IDs for lvs2014
em1. - 11995
em2. - 11997
nic2 p1 - 11996
nic2 p2 - 11998

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye completed:

  • lvs2014 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306081357_pt1979_967034_lvs2014.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Change 928112 merged by Ssingh:

[operations/puppet@production] lvs2014: commission new LVS host (codfw hardware refresh)

https://gerrit.wikimedia.org/r/928112

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye executed with errors:

  • lvs2014 (FAIL)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye executed with errors:

  • lvs2014 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by sukhe@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by sukhe@cumin2002 for host lvs2014.codfw.wmnet with OS bullseye completed:

  • lvs2014 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202306091040_sukhe_2294224_lvs2014.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 928113 merged by Ssingh:

[operations/homer/public@master] sites.yaml: add new LVS host lvs2014 (codfw hardware refresh)

https://gerrit.wikimedia.org/r/928113

Change 928818 had a related patch set uploaded (by Ssingh; author: Ssingh):

[operations/puppet@production] hiera: lvs/balancer: unify hiera post hardware refresh (codfw)

https://gerrit.wikimedia.org/r/928818

Change 928818 merged by Ssingh:

[operations/puppet@production] hiera: lvs/balancer: unify hiera post hardware refresh (codfw)

https://gerrit.wikimedia.org/r/928818