Page MenuHomePhabricator

Q3:(Need By: TBD) rack/setup/install netmon1003
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of netmon1003

Hostname / Racking / Installation Details

Hostnames: netmon1003
Racking Proposal: Next available space in eqiad
Networking/Subnet/VLAN/IP: 1G public
Partitioning/Raid: sw raid1
OS Distro: Bullseye (default unless otherwise specified)

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

netmon1003:
  • - receive in system on procurement task T297150 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Related Objects

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH mentioned this in Unknown Object (Task).
RobH added a subscriber: herron.
RobH unsubscribed.

netmon1003 B1 U32 Port30 Cableid 23000067

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host netmon1003.wikimedia.org with OS bullseye

Change 793557 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add netmon1003 to site.pp

https://gerrit.wikimedia.org/r/793557

Change 793557 merged by Papaul:

[operations/puppet@production] Add netmon1003 to site.pp

https://gerrit.wikimedia.org/r/793557

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host netmon1003.wikimedia.org with OS bullseye executed with errors:

  • netmon1003 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host netmon1003.wikimedia.org with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host netmon1003.wikimedia.org with OS bullseye completed:

  • netmon1003 (WARN)
    • Downtimed on Icinga/Alertmanager
    • Unable to disable Puppet, the host may have been unreachable
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202205200027_pt1979_1494549_netmon1003.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Papaul updated the task description. (Show Details)
Papaul subscribed.

@Jclark-ctr complete

Happy Monday @Jclark-ctr, @Papaul!

Is this host ready? I am checking in case there are any bits pending before we tinker with the host as the task is still open :-) thank you for all your help!

@lmata Happy Monday to you as well. The host is ready.

fgiunchedi subscribed.

Thank you @Papaul! Resolving, we'll be following up in T309074: Put netmon1003 in service