Page MenuHomePhabricator

Q3:rack/setup/install cloudlb200[23]-dev
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of cloudlb200[23]-dev

Hostname / Racking / Installation Details

Hostnames: cloudlb200[23]-dev
Racking Proposal: WMCS Racks
Networking Setup: 10G
Partitioning/Raid: HW Raid: N
OS Distro: Bullseye
Sub-team Technical Contact: @aborrero

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

cloudlb2002-dev: Rack: B1- U33 - Port xe-0/0/31
  • - receive in system on procurement task T328965 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::wmcs
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
cloudlb2003-dev: Rack: B1- U32 - Port xe-1/0/32
  • - receive in system on procurement task T328965 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::wmcs
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Event Timeline

RobH mentioned this in Unknown Object (Task).Feb 16 2023, 5:54 PM
RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-codfw board.
RobH unsubscribed.

I've added these in Netbox.

Confused at first, they are 10G links actually so xe-0/0/31 and xe-0/0/32.

@Papaul can you add the cabel labels id's when you get a moment?

https://netbox.wikimedia.org/dcim/cables/6214/

https://netbox.wikimedia.org/dcim/cables/6213/

Thanks.

@cmooney i don't do cable id's for servers in codfw.

@cmooney if you working on this task please don't forget to check the boxes in the description so I can keep track. Thank you

To update here I believe all but the last 2 items has been completed for both of these. I have:

  • Run the provision script to add the switch connects to the hosts and assign IPs.
  • Run the sre.hosts.provision, after which both iDRAC is reachable
  • Run Homer against the switch to provision the ports, which are now up/up and ready to go.

@Papaul I didn't tick the boxes in the task description as I'm not 100% sure there isn't anything I missed there, but just to keep you up to date. Also I didn't run the firmware upgrade cookbook.

If the IDRAC is reachable and I see already entries for mgmt and production IP in Netbox that means you already did cookbook sre.dns.netbox so we can check the third box as well. Thanks

Change 896382 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Add cloudlb200[23] to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/896382

Change 896382 merged by Papaul:

[operations/puppet@production] Add cloudlb200[23] to site.pp and netboot.cfg

https://gerrit.wikimedia.org/r/896382

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye executed with errors:

  • cloudlb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye executed with errors:

  • cloudlb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye executed with errors:

  • cloudlb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye executed with errors:

  • cloudlb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye executed with errors:

  • cloudlb2002-dev (FAIL)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • The reimage failed, see the cookbook logs for the details

Cookbook cookbooks.sre.hosts.reimage was started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host cloudlb2003-dev.codfw.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by cmooney@cumin1001 for host cloudlb2002-dev.codfw.wmnet with OS bullseye completed:

  • cloudlb2002-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303101813_cmooney_3619631_cloudlb2002-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host cloudlb2003-dev.codfw.wmnet with OS bullseye completed:

  • cloudlb2003-dev (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202303101822_pt1979_812642_cloudlb2003-dev.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
Papaul updated the task description. (Show Details)

@aborrero this is done