Page MenuHomePhabricator

Q2:(Need By: TBD) rack/setup/install ganeti202[78].codfw.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of ganeti202[78].codfw.wmnet

Hostname / Racking / Installation Details

Hostnames: ganeti202[78].codfw.wmnet
Racking Proposal: Please add these twoin row A in two different racks different from A5. If row A is too full, instead in two B, in racks other than B1 or B5.
Networking/Subnet/VLAN/IP: 10G, same VLAN/IP setup as existing Ganeti servers
Partitioning/Raid: partman/custom/ganeti-raid5.cfg
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

ganeti2027.codfw.wmnet:

  • - receive in system on procurement task T291973 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

ganeti2028.codfw.wmnet:

  • - receive in system on procurement task T291973 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Related Objects

StatusSubtypeAssignedTask
ResolvedPapaul

Event Timeline

RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH edited subscribers, added: MoritzMuehlenhoff; removed: RobH.
wiki_willy renamed this task from (Need By: TBD) rack/setup/install ganeti202[78].codfw.wmnet to Q2:(Need By: TBD) rack/setup/install ganeti202[78].codfw.wmnet.Oct 22 2021, 9:46 PM
RobH added a parent task: Unknown Object (Task).Oct 25 2021, 6:57 PM

@MoritzMuehlenhoff @RobH the 2 ganeti nodes are we racking them in a 10G rack or 1G?

"Networking/Subnet/VLAN/IP: 10G, same VLAN/IP setup as existing Ganeti servers"

@MoritzMuehlenhoff @RobH the 2 ganeti nodes are we racking them in a 10G rack or 1G?

If there's sufficient space, 10G please.

Change 740679 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add ganeti200[18] to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/740679

Change 740679 merged by Papaul:

[operations/puppet@production] Add ganeti200[78] to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/740679

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ganeti2027.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ganeti2027.codfw.wmnet with OS buster completed:

  • ganeti2027 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202111222107_pt1979_1523286_ganeti2027.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ganeti2028.codfw.wmnet with OS buster

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ganeti2028.codfw.wmnet with OS buster completed:

  • ganeti2028 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh buster OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202111222150_pt1979_1528874_ganeti2028.out
    • Checked BIOS boot parameters are back to normal
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> staged

Change 745198 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Make ganeti2027 a Ganeti node

https://gerrit.wikimedia.org/r/745198

Change 745198 merged by Muehlenhoff:

[operations/puppet@production] Make ganeti2027 a Ganeti node

https://gerrit.wikimedia.org/r/745198

I can't connect to the serial console of ganeti2027 with our management password, but ganeti2028 works. Does 2027 maybe still use the factory default?

Mentioned in SAL (#wikimedia-operations) [2021-12-09T11:38:22Z] <moritzm> added ganeti2027 to ganeti codfw cluster T294139

Mentioned in SAL (#wikimedia-operations) [2021-12-16T09:46:18Z] <moritzm> added ganeti2028 to ganeti codfw cluster T294139