Page MenuHomePhabricator

Q4:rack/setup/install apus-fe2003
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of apus-fe2003

Hostname / Racking / Installation Details

Hostnames: apus-fe2003
Racking Proposal: If possible avoid racks containing moss-fe* nodes (C2,D2)
Networking Setup: 10G production network
OS Distro: Bookworm
Sub-team Technical Contact: @MatthewVernon

Per host setup checklist

apus-fe2003
  • Receive in system on procurement task T388240 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

Jhancock.wm updated the task description. (Show Details)
Jhancock.wm updated the task description. (Show Details)
Jhancock.wm mentioned this in Unknown Object (Task).Mar 31 2025, 4:06 PM

@MatthewVernon I need more clarification on which vlan this server should go on. I don't have any other server examples and the only other apus ips i can find in netbox are in the lvs ip range. is there a specific vlan id number you want to use, or is a generic private or public ip? want to be sure i know what to do before i start throwing monkey wrenches =)

@Jhancock.wm is should be networked like moss-fe2001 and moss-fe2002, please (apus-* are the new names, moss-* will gradually get cycled out).

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host apus-fe2003.codfw.wmnet with OS bookworm

hit an error with the raid during the os install. No specific error was given. Will come back to this later.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host apus-fe2003.codfw.wmnet with OS bookworm executed with errors:

  • apus-fe2003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console apus-fe2003.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jhancock@cumin2002 for host apus-fe2003.codfw.wmnet with OS bookworm

             Error while setting up RAID                   
│ An unexpected error occurred while setting up a preseeded RAID  │
│ configuration.                                                  │
│                                                                 │
│ Check /var/log/syslog or see virtual console 4 for the details.

@Papaul i can't figure this one out. Not sure why this one is failing with the generic error message.

Cookbook cookbooks.sre.hosts.reimage started by jhancock@cumin2002 for host apus-fe2003.codfw.wmnet with OS bookworm executed with errors:

  • apus-fe2003 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console apus-fe2003.codfw.wmnet" to get a root shell, but depending on the failure this may not work.

@Jhancock.wm that error looks to me that the server is missing an entry in partman. Have you checked it the server has a partman recipe?

Change #1133849 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] install-server: also run configure_swift_disks for apus-*

https://gerrit.wikimedia.org/r/1133849

Change #1133849 merged by MVernon:

[operations/puppet@production] install-server: also run configure_swift_disks for apus-*

https://gerrit.wikimedia.org/r/1133849

Cookbook cookbooks.sre.hosts.reimage was started by mvernon@cumin2002 for host apus-fe2003.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by mvernon@cumin2002 for host apus-fe2003.codfw.wmnet with OS bookworm completed:

  • apus-fe2003 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Add puppet_version metadata (7) to Debian installer
    • Checked BIOS boot parameters are back to normal
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202504031032_mvernon_3273068_apus-fe2003.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
MatthewVernon updated the task description. (Show Details)

OK, this is fixed, sorry about that (I'd done most of the necessary preseed changes, but had missed one).

All good! thank you for your help!

Change #1134208 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] Add apus-fe2003 to hiera and conftool

https://gerrit.wikimedia.org/r/1134208

Change #1134208 merged by MVernon:

[operations/puppet@production] Add apus-fe2003 to hiera and conftool

https://gerrit.wikimedia.org/r/1134208