Page MenuHomePhabricator

Q2:rack/setup/install db2249
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of db2249

Hostname / Racking / Installation Details

This section should list the racking restrictions for these hosts. If they shouldn't share a rack/row with one another, or any existing hosts. This section should also list the other details listed below.

Hostnames: db2249
Racking Proposal: If possible rack it in row B, anywhere
Networking Setup: # of Connections:1 Speed:1G. - VLAN:Private
OS Distro: Bookworm
Boot Method: UEFI
Sub-team Technical Contact: @Marostegui

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

db2249
  • Receive in system on procurement task T405272 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

Jhancock.wm mentioned this in Unknown Object (Task).

The patch was done before this task got created, but linking it here for clarity https://gerrit.wikimedia.org/r/c/operations/puppet/+/1197750

yes forgot to mention that while making this one. thank you so much for getting it done early!

RobH triaged this task as Medium priority.Oct 23 2025, 7:45 PM
RobH shifted this object from the S1 Public space to the Restricted Space space.
RobH removed a project: procurement.
RobH unsubscribed.
RobH shifted this object from the Restricted Space space to the S1 Public space.Nov 4 2025, 3:35 PM
RobH removed a project: procurement.

Change #1202051 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2249: Make a note about 1P testing host

https://gerrit.wikimedia.org/r/1202051

Change #1202051 merged by Marostegui:

[operations/puppet@production] db2249: Make a note about 1P testing host

https://gerrit.wikimedia.org/r/1202051

@elukey
i hit an error running the provisioning script on this one. Could you take a look at it when you have time? Not sure what i missed on it. It is a new chassis type but not sure what changed.

BIOS firmware version: BIOS Date: 10/30/2025 Ver 2.8
Retrieving BIOS settings (first round).
Retrieving updated BIOS settings...
Setting up BootMode and basic BIOS settings.
BIOS: QuietBoot is set to True, while we want False
BIOS: ConsoleRedirection is not present in the current settings.
Exception raised while executing cookbook sre.hosts.provision:
Traceback (most recent call last):
File "/srv/deployment/spicerack/cookbooks/sre/hosts/provision.py", line 580, in _found_diffs_bios_attributes
  if not bios_attributes[key] == value:
         ~~~~~~~~~~~~~~~^^^^^
KeyError: 'ConsoleRedirection'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 265, in _run
  raw_ret = runner.run()
            ^^^^^^^^^^^^
File "/srv/deployment/spicerack/cookbooks/sre/hosts/provision.py", line 360, in run
  self._config_host()
File "/srv/deployment/spicerack/cookbooks/sre/hosts/provision.py", line 504, in _config_host
  should_patch = self._found_diffs_bios_attributes(bios_attributes)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/deployment/spicerack/cookbooks/sre/hosts/provision.py", line 589, in _found_diffs_bios_attributes
  raise RuntimeError(
RuntimeError: Error while checking BIOS attribute ConsoleRedirection
Released lock for key /spicerack/locks/cookbooks/sre.hosts.provision:db2249: {'concurrency': 1, 'created': '2025-12-19 21:03:07.519643', 'owner': 'jhancock@cumin1003 [3199481]', 'ttl': 1800}
END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2249.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART

Just to double check: this is being provisioned with UEFI right?

Thanks - I just checked that the UEFI partman recipe is assigned to it.

Change #1220311 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/cookbooks@master] sre.hosts.provision: make some Supermicro checks dynamic

https://gerrit.wikimedia.org/r/1220311

@Jhancock.wm Hi! I filed https://gerrit.wikimedia.org/r/1220311 to hopefully avoid this in the future, the provision cookbook should be smarter after the patch will be merged. While testing I found another issue, namely that in this Supermicro model the "FQDN" option is not settable among the BMC settings. It is not straightforward in that case to make a dynamic check, so we have to keep an allow-list for the moment (so anytime a FQDN-related error pops pup in provisioning when setting the BMC network values, we'll have to add the model to the allow-list in the Python code).

I've ran provisioning for the host using test-cookbook, should be ready to go now!

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host db2249.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host db2249.codfw.wmnet with OS bookworm executed with errors:

  • db2249 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console db2249.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host db2249.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host db2249.codfw.wmnet with OS bookworm executed with errors:

  • db2249 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console db2249.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host db2249.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host db2249.codfw.wmnet with OS bookworm completed:

  • db2249 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Unable to downtime the new host on Icinga/Alertmanager, the sre.hosts.downtime cookbook returned 99
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202601052115_jhancock_2443966_db2249.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Jhancock.wm updated the task description. (Show Details)

@Marostegui this is completed.

Change #1220311 merged by Elukey:

[operations/cookbooks@master] sre.hosts.provision: make some Supermicro checks dynamic

https://gerrit.wikimedia.org/r/1220311

Marostegui mentioned this in Unknown Object (Task).Jan 8 2026, 5:29 AM