Page MenuHomePhabricator

(Need By: TBD) rack/setup/install ganeti202[56]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of ganeti202[56]

Hostname / Racking / Installation Details

Hostnames: ganeti202[56]
Racking Proposal: Ideally they share a rack/row with a minimum number of other ganeti hosts. Current Ganeti host breakdown in codfw: A5:4, B1:2, B5:2 C1:2, C5:2, C6:2, D1:1, D3:1, D5:1, D8:1. Avoid any rack that already has 2 hosts or more, ideally they share with none or just 1 other ganeti host. Ideally they'd end up in row A and B if capacity allows.
Networking/Subnet/VLAN/IP: Single 1G connection for production, but has more complex networking setup (see ganeti2024)
Partitioning/Raid: no hw raid, partman/custom/ganeti-raid5.cfg
OS Distro: Stretch

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

ganeti2025: rack D1 U8 ge-1/0/20

  • - receive in system on procurement task T279174 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

ganeti2025: rack D6 U7 ge-6/0/6

  • - receive in system on procurement task T279174 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Related Objects

StatusSubtypeAssignedTask
ResolvedPapaul

Event Timeline

RobH added a parent task: Unknown Object (Task).May 11 2021, 6:45 PM

@MoritzMuehlenhoff,

You approved the quote/spec for this, but we didn't get updated racking details on the procurement request T279174, so we'll need to confirm them here. I filled out the racking details with information generated from past experience on these, but I would appreciate a reality check on the details in the task description. Once they are all correct, please reassign this task to @Papaul.

Thanks!

RobH mentioned this in Unknown Object (Task).May 11 2021, 6:47 PM
RobH reassigned this task from Papaul to Jclark-ctr.
RobH reassigned this task from Jclark-ctr to Papaul.
RobH added a subscriber: MoritzMuehlenhoff.
RobH added a subscriber: Jclark-ctr.
RobH unsubscribed.
RobH subscribed.
RobH unsubscribed.
This comment was removed by Papaul.

Change 700678 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] DHCP, Site.pp: Add ganeti202[56] to site.pp and it's MAC address

https://gerrit.wikimedia.org/r/700678

Change 700678 merged by Papaul:

[operations/puppet@production] DHCP, Site.pp: Add ganeti202[56] to site.pp and it's MAC address

https://gerrit.wikimedia.org/r/700678

Script wmf-auto-reimage was launched by pt1979 on cumin2002.codfw.wmnet for hosts:

ganeti2025.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202106212103_pt1979_2224285_ganeti2025_codfw_wmnet.log.

Completed auto-reimage of hosts:

['ganeti2025.codfw.wmnet']

Of which those FAILED:

['ganeti2025.codfw.wmnet']

Change 700720 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add ganeti202[56] to partman

https://gerrit.wikimedia.org/r/700720

Change 700720 merged by Papaul:

[operations/puppet@production] Add ganeti202[56] to partman

https://gerrit.wikimedia.org/r/700720

Script wmf-auto-reimage was launched by pt1979 on cumin2002.codfw.wmnet for hosts:

ganeti2025.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202106220017_pt1979_2246312_ganeti2025_codfw_wmnet.log.

Completed auto-reimage of hosts:

['ganeti2025.codfw.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by pt1979 on cumin2002.codfw.wmnet for hosts:

ganeti2026.codfw.wmnet

The log can be found in /var/log/wmf-auto-reimage/202106220046_pt1979_2250535_ganeti2026_codfw_wmnet.log.

Completed auto-reimage of hosts:

['ganeti2026.codfw.wmnet']

and were ALL successful.

Papaul updated the task description. (Show Details)

@MoritzMuehlenhoff this is ready for service.

Change 732911 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Set ganeti2025/2026 to insetup

https://gerrit.wikimedia.org/r/732911

Change 732911 merged by Muehlenhoff:

[operations/puppet@production] Set ganeti2025/2026 to insetup

https://gerrit.wikimedia.org/r/732911

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2025.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2025.codfw.wmnet with OS buster completed:

  • ganeti2025 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202110220835_jmm_201725_ganeti2025.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin2002 for host ganeti2026.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin2002 for host ganeti2026.codfw.wmnet with OS buster completed:

  • ganeti2026 (PASS)
    • Downtimed on Icinga
    • Disabled Puppet
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202110221003_jmm_213070_ganeti2026.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 745519 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Enable ganeti2025 as ganeti server

https://gerrit.wikimedia.org/r/745519

Change 745519 merged by Muehlenhoff:

[operations/puppet@production] Enable ganeti2025 as ganeti server

https://gerrit.wikimedia.org/r/745519

Mentioned in SAL (#wikimedia-operations) [2021-12-15T12:08:26Z] <moritzm> added ganeti2025 to codfw ganeti cluster T282603

Mentioned in SAL (#wikimedia-operations) [2022-01-19T11:15:43Z] <moritzm> add ganeti2026 to Ganeti codfw cluster T282603

Mentioned in SAL (#wikimedia-operations) [2022-01-19T11:35:07Z] <moritzm> rebalance ganeti group D in codfw after adding ganeti2026 T282603