Page MenuHomePhabricator

Q1:rack/setup/install dbprov2007
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of dbprov2007

Hostname / Racking / Installation Details

Hostnames: dbprov2007
Racking Proposal: Ideally as separate as possible from dbprov2004,5,6, which are at C4, B4 & A4 (different row, if not possible, different rack).
Networking Setup: # of Connections:1*10G. - VLAN:Private
RAID Setup: Create 2 logical disks- first one with the HDs with RAID 6 where the os will be installed. Create a second logical disk with the SSDs in RAID 0, Partman recipe and/or desired Raid Level: db.cfg https://gerrit.wikimedia.org/r/c/operations/puppet/+/1171245 (this should just setup the HDs, the SSDs will be setup after puppet has run, don't worry about that).
OS Distro: Bookworm
Boot Method:
Sub-team Technical Contact: @jcrespo

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

dbprov2007
  • Receive in system on procurement task T399039 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Related Objects

StatusSubtypeAssignedTask
ResolvedJhancock.wm

Event Timeline

RobH mentioned this in Unknown Object (Task).Jul 24 2025, 6:32 PM
RobH added a parent task: Unknown Object (Task).

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host dbprov2007.codfw.wmnet with OS bookworm

note to self: configured the wrong port on the switch. need to delete and redo. should be quick.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host dbprov2007.codfw.wmnet with OS bookworm executed with errors:

  • dbprov2007 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console dbprov2007.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host dbprov2007.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host dbprov2007.codfw.wmnet with OS bookworm executed with errors:

  • dbprov2007 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console dbprov2007.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

@Papaul this one did the thing about going to the wrong puppet server again. Can you delete it so i can try again later?

[8/10, retrying in 640.00s] Attempt to run 'spicerack.puppet.PuppetServer.wait_for_csr' raised: The puppet server has no CSR for dbprov2007.codfw.wmnet
[9/10, retrying in 1280.00s] Attempt to run 'spicerack.puppet.PuppetServer.wait_for_csr' raised: The puppet server has no CSR for dbprov2007.codfw.wmnet

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin1003 for host dbprov2007.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin1003 for host dbprov2007.codfw.wmnet with OS bookworm completed:

  • dbprov2007 (WARN)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202508061325_jhancock_851571_dbprov2007.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is not optimal, downtime not removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
Jhancock.wm updated the task description. (Show Details)

@jcrespo this is complete