Page MenuHomePhabricator

Q2:rack/setup/install ganeti-jumbo200[1-3]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of ganeti-jumbo200[1-3]

Hostname / Racking / Installation Details

Hostnames: ganeti-jumbo200[1-3]
Racking Proposal: Different rack per host
Networking Setup: # of Connections:1/2 - Speed:1G/10G. - VLAN:Private/Public/Other(Specify) : 1 10G/private network
OS Distro: Trixie
Boot Method: UEFI
Sub-team Technical Contact: EU hours: Ben Tullis (@BTullis, btullis on IRC) US hours: Brian King (@bking, inflatador on IRC)

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

ganeti-jumbo2001
  • Receive in system on procurement task T404777 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
ganeti-jumbo2002
  • Receive in system on procurement task T404777 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook
ganeti-jumbo2003
  • Receive in system on procurement task T404777 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH unsubscribed.

@bking,

Please update the site.pp file with the insetup role for your team (detailed on https://wikitech.wikimedia.org/wiki/SRE/Dc-operations) and add the new servers to preseed.yml for partition info.

If possible, please reference this task number in your patch set, so it is clear when complete. Once complete, just un-assign yourself (leaving no assignee) for this task and once the hardware arrives on-site engineerss will claim this task for racking and setup. Please don't re-subscribe me to this task unless there is a direct question for me.

Thank you!

Change #1196935 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] site.pp: Add ganeti-jumbo hosts

https://gerrit.wikimedia.org/r/1196935

Change #1196952 had a related patch set uploaded (by Bking; author: Bking):

[operations/puppet@production] ganeti-jumbo: Add hosts and partman recipe

https://gerrit.wikimedia.org/r/1196952

Change #1196935 abandoned by Bking:

[operations/puppet@production] site.pp: Add ganeti-jumbo hosts

Reason:

superseded by 1196952

https://gerrit.wikimedia.org/r/1196935

Change #1196952 merged by Bking:

[operations/puppet@production] ganeti-jumbo: Add hosts and partman recipe

https://gerrit.wikimedia.org/r/1196952

bking changed the task status from Open to In Progress.Oct 27 2025, 1:56 PM

Hello DC Ops,

I've added the hosts and partman recipe to Puppet as requested. Please note that the partman recipe is untested, so if the reimage fails more than once, please ping me in IRC (inflatador) and/or assign this task to me. The same goes for T405966, as the hosts in that ticket are identical to these.

Thanks for your help!

bking removed bking as the assignee of this task.Nov 20 2025, 2:47 PM

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host ganeti-jumbo2001.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host ganeti-jumbo2002.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host ganeti-jumbo2003.codfw.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host ganeti-jumbo2002.codfw.wmnet with OS trixie completed:

  • ganeti-jumbo2002 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512102154_jhancock_3400827_ganeti-jumbo2002.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host ganeti-jumbo2003.codfw.wmnet with OS trixie completed:

  • ganeti-jumbo2003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512102158_jhancock_3400851_ganeti-jumbo2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host ganeti-jumbo2001.codfw.wmnet with OS trixie completed:

  • ganeti-jumbo2001 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202512102202_jhancock_3400817_ganeti-jumbo2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Jhancock.wm updated the task description. (Show Details)

@bking these are ready. I didn't run into any issues with the reimage so it's now tested.