Page MenuHomePhabricator

Q3:(Need By: TBD) rack/setup/install ganeti2029.codfw.wmnet, ganeti2030.codfw.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of <enter the FQDN/hostname of the hosts being setup here>

Hostname / Racking / Installation Details

Hostnames: ganeti2029.codfw.wmnet, ganeti2030.codfw.wmnet
Racking Proposal: These are replacing two servers in row A, so we need them in row A as well
Networking/Subnet/VLAN/IP: 10G, same VLAN/IP setup as existing Ganeti servers
Partitioning/Raid: partman/custom/ganeti-raid5.cfg
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

ganeti2029: A2 U28 xe-2/0/27
  • - receive in system on procurement task <enter task # here> & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - Enable "Virtualization technology" under "System BIOS" -> "Processor Settings"
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
ganeti2030: A2 U29 xe-2/0/28
  • - receive in system on procurement task <enter task # here> & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - Enable "Virtualization technology" under "System BIOS" -> "Processor Settings"
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Event Timeline

RobH added a parent task: Unknown Object (Task).

Change 759263 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Simplify Ganeti partman setup for codfw servers

https://gerrit.wikimedia.org/r/759263

Change 759263 merged by Muehlenhoff:

[operations/puppet@production] Simplify Ganeti partman setup for codfw servers

https://gerrit.wikimedia.org/r/759263

Change 759265 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add ganeti2029 and ganeti2030 to site.pp

https://gerrit.wikimedia.org/r/759265

Change 759265 merged by Papaul:

[operations/puppet@production] Add ganeti2029 and ganeti2030 to site.pp

https://gerrit.wikimedia.org/r/759265

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ganeti2029.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ganeti2029.codfw.wmnet with OS buster executed with errors:

  • ganeti2029 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ganeti2029.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ganeti2029.codfw.wmnet with OS buster completed:

  • ganeti2029 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202071622_pt1979_2117607_ganeti2029.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ganeti2030.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ganeti2030.codfw.wmnet with OS buster completed:

  • ganeti2030 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202071653_pt1979_2122586_ganeti2030.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged
Papaul updated the task description. (Show Details)
Papaul added a subscriber: MoritzMuehlenhoff.

@MoritzMuehlenhoff this is complete

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mc2038.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mc2038.codfw.wmnet with OS buster completed:

  • mc2038 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202202081616_pt1979_2290948_mc2038.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host mc2040.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host mc2040.codfw.wmnet with OS buster executed with errors:

  • mc2040 (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Change 765201 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Make ganeti2029/ganeti2030 Ganeti nodes

https://gerrit.wikimedia.org/r/765201

Change 765201 merged by Muehlenhoff:

[operations/puppet@production] Make ganeti2029/ganeti2030 Ganeti nodes

https://gerrit.wikimedia.org/r/765201

Change 766065 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add ganeti2029 as new node in codfw

https://gerrit.wikimedia.org/r/766065

Change 766065 merged by Muehlenhoff:

[operations/puppet@production] Add ganeti2029 as new node in codfw

https://gerrit.wikimedia.org/r/766065

Mentioned in SAL (#wikimedia-operations) [2022-02-25T10:41:00Z] <moritzm> enabled virtualisation in BIOS for ganeti2029 T298998

Mentioned in SAL (#wikimedia-operations) [2022-02-25T11:04:07Z] <moritzm> added ganeti2029 to codfw Ganeti cluster T298998