Page MenuHomePhabricator

Q3:rack/setup/install bast1004
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of bast1004

Hostname / Racking / Installation Details

This section should list the racking restrictions for these hosts. If they shouldn't share a rack/row with one another, or any existing hosts. This section should also list the other details listed below.

Hostnames: bast1004.wikimedia.org
Racking Proposal: Any rack is fine
Networking Setup: Public IP, these don't really need 10G (but if since have a 10G NIC and if we have a free 10G port, it won't hurt of course)
OS Distro: Trixie
Boot Method: UEFI.
Sub-team Technical Contact: Moritz or if not around anyone else from SRE IF

bast1004
  • Receive in system on procurement task T412563 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Run the sre.hardware.upgrade-firmware cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

Jhancock.wm mentioned this in Unknown Object (Task).Feb 2 2026, 9:55 PM
Jclark-ctr updated Other Assignee, added: Jclark-ctr.
Jclark-ctr added subscribers: Andrew, Jclark-ctr.

@Andrew would you be able to help with adding server to Site.pp and updating preseed.yaml for efi booting?

Jclark-ctr updated Other Assignee, added: Andrew; removed: Jclark-ctr.

Change #1236765 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add site.pp entry for bast1004

https://gerrit.wikimedia.org/r/1236765

Change #1236767 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Configure UEFI partman config for bast1004

https://gerrit.wikimedia.org/r/1236767

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host bast1004.eqiad.wmnet with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host bast1004.eqiad.wmnet with OS trixie executed with errors:

  • bast1004 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console bast1004.eqiad.wmnet" to get a root shell, but depending on the failure this may not work.

Change #1236765 merged by Andrew Bogott:

[operations/puppet@production] Add site.pp and preseed entry for bast1004

https://gerrit.wikimedia.org/r/1236765

Change #1236767 abandoned by Muehlenhoff:

[operations/puppet@production] Configure UEFI partman config for bast1004

Reason:

Duplicated

https://gerrit.wikimedia.org/r/1236767

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host bast1004.wikimedia.org with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host bast1004.wikimedia.org with OS trixie executed with errors:

  • bast1004 (FAIL)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • The reimage failed, see the cookbook logs for the details. You can also try typing "sudo install-console bast1004.wikimedia.org" to get a root shell, but depending on the failure this may not work.

Cookbook cookbooks.sre.hosts.reimage was started by jclark@cumin1003 for host bast1004.wikimedia.org with OS trixie

Cookbook cookbooks.sre.hosts.reimage started by jclark@cumin1003 for host bast1004.wikimedia.org with OS trixie completed:

  • bast1004 (PASS)
    • Downtimed on Icinga/Alertmanager
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced UEFI HTTP Boot for next reboot
    • Host rebooted via Redfish
    • Host up (Debian installer)
    • Host up (new fresh trixie OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • Removed previous downtime on Alertmanager (old OS)
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202602041724_jclark_2648072_bast1004.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
Jclark-ctr updated the task description. (Show Details)

Change #1237916 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Make bast1004 a bastion

https://gerrit.wikimedia.org/r/1237916

Change #1237916 merged by Muehlenhoff:

[operations/puppet@production] Make bast1004 a bastion

https://gerrit.wikimedia.org/r/1237916