Page MenuHomePhabricator

Q2:rack/setup/install logging-sd200[567]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of logging-sd200[567]

Hostname / Racking / Installation Details

Hostnames: logging-sd200[567]
Racking Proposal: Should not share a rack with other logging-sd host(s)
Networking Setup: # of Connections:1/2 - Speed:1G/10G. - VLAN:Private/Public/Other(Specify) :
OS Distro: Bookworm
Boot Method: Legacy BIOS or UEFI. Please note UEFI must have partman updates applied in advance of setup and is currently in pilot program: https://wikitech.wikimedia.org/wiki/UEFI_Boot
Sub-team Technical Contact: @colewhite

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

logging-sd2005
  • Receive in system on procurement task T404785 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
logging-sd2006
  • Receive in system on procurement task T404785 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
logging-sd2007
  • Receive in system on procurement task T404785 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH unsubscribed.

@colewhite,

Please update the site.pp file with the insetup role for your team (detailed on https://wikitech.wikimedia.org/wiki/SRE/Dc-operations) and add the new servers to preseed.yml for partition info.

If possible, please reference this task number in your patch set, so it is clear when complete. Once complete, just un-assign yourself (leaving no assignee) for this task and once the hardware arrives on-site engineerss will claim this task for racking and setup. Please don't re-subscribe me to this task unless there is a direct question for me.

Thank you!

RobH added a parent task: Unknown Object (Task).Oct 8 2025, 11:33 PM

Change #1199062 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] site: add logging-sd hosts insetup

https://gerrit.wikimedia.org/r/1199062

Change #1199062 merged by Cwhite:

[operations/puppet@production] site: initial setup for new logging-sd hosts

https://gerrit.wikimedia.org/r/1199062

Change #1212164 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] installserver: fix partmain typo

https://gerrit.wikimedia.org/r/1212164

Change #1212164 merged by Filippo Giunchedi:

[operations/puppet@production] installserver: fix partmain typo

https://gerrit.wikimedia.org/r/1212164

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host logging-sd2006.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host logging-sd2005.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host logging-sd2007.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host logging-sd2006.codfw.wmnet with OS bookworm completed:

  • logging-sd2006 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512101659_jhancock_3295799_logging-sd2006.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host logging-sd2007.codfw.wmnet with OS bookworm completed:

  • logging-sd2007 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512101717_jhancock_3318763_logging-sd2007.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host logging-sd2005.codfw.wmnet with OS bookworm completed:

  • logging-sd2005 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512101748_jhancock_3318276_logging-sd2005.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • Failed to run the sre.puppet.sync-netbox-hiera cookbook, run it manually
Jhancock.wm updated the task description. (Show Details)

@colewhite these are complete