Page MenuHomePhabricator

Q2:(Need By: TBD) rack/setup/install kubernetes1022
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of kubernetes1022

Hostname / Racking / Installation Details

hostname: kubernetes1022
Racking: Try to distribute evenly in a 1G rack with existing cluster which is as follows: A3:2, A5:2, A6:1, B3:2, B5:1, B6:1, C3:2, C5:2, D3:3, D5:1. With that current distribution, try to avoid rack D3 (as it has 3 hosts) and any rack with 2 if possible. Ideally its solo in a rack, but rack space availability makes that unlikely.
Networking details: single 1G connection to private1 vlan
partitioning: standard raid1-2dev

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

kubernetes1022:

  • - receive in system on procurement task T292001 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit

x[x] - firmware update (idrac, bios, network, raid controller)

  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Related Objects

StatusSubtypeAssignedTask
Resolved Cmjohnson

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH unsubscribed.

added Servers to netbox

Jclark-ctr subscribed.

kubernetes1022 B6 U35 cableid#3964 Port34

Change 747871 had a related patch set uploaded (by Cmjohnson; author: Cmjohnson):

[operations/puppet@production] Adding kubernetes1022 to site.pp and netboot.cfg setup role

https://gerrit.wikimedia.org/r/747871

Change 747871 merged by Cmjohnson:

[operations/puppet@production] Adding kubernetes1022 to site.pp and netboot.cfg setup role

https://gerrit.wikimedia.org/r/747871

Cookbook cookbooks.sre.hosts.reimage started by cmjohnson@cumin1001 for host kubernetes1022.eqiad.wmnet with OS bullseye completed:

kubernetes1022 (PASS)
Removed from Puppet and PuppetDB if present
Deleted any existing Puppet certificate
Removed from Debmonitor if present
Forced PXE for next reboot
Host rebooted via IPMI
Host up (Debian installer)
Host up (new fresh bullseye OS)
Generated Puppet certificate
Signed new Puppet certificate
Run Puppet in NOOP mode to populate exported resources in PuppetDB
Found Nagios_host resource for this host in PuppetDB
Downtimed the new host on Icinga
First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202112161605_cmjohnson_11778_kubernetes1022.out
Checked BIOS boot parameters are back to normal
Rebooted
Automatic Puppet run was successful
Forced a re-check of all Icinga services for the host
Icinga status is optimal
Icinga downtime removed
Updated Netbox data from PuppetDB
Updated Netbox status planned -> staged

Cmjohnson updated the task description. (Show Details)

ready to turnover