Page MenuHomePhabricator

Q3:rack/setup/install ms-be107[2-5]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of ms-be107[2-5]

Hostname / Racking / Installation Details

Hostnames: ms-be107[2-5].eqiad.wmnet
Racking Proposal: 1 host per row
Networking Setup: 10G private Vlan like the other ms-be* nodes
Partitioning/Raid: JBOD, please unlike previous ms-be* nodes, we now want everything non-RAID (cf T308677)
OS Distro: Bullseye
Sub-team Technical Contact: @MatthewVernon

Per host setup checklist

ms-be1072:
  • - receive in system on procurement task T325212 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
ms-be1073:
  • - receive in system on procurement task T325212 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
ms-be1074:
  • - receive in system on procurement task T325212 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.
ms-be1075:
  • - receive in system on procurement task T325212 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer from an active cumin host to commit
  • - bios/drac/serial setup/testing, see Lifecycle Steps & Automatic BIOS setup details
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to netboot.pp, and site.pp role::insetup::data_persistence
  • - OS installation & initital puppet run via sre.hosts.reimage cookbook.

Event Timeline

RobH mentioned this in Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH unsubscribed.

@MatthewVernon Are there any racks that need to avoid. Do to the weight of these servers and space availability i would be easyier to rack 2 row E, x2 row F. I understand thats not preferred but can these share racks with other ms-be host

ms-be1072. A4 U27 cableid 20220021 port 42
ms-be1073. B4. U10 cableid 5018 port 12
ms-be1074. E3. U5 cableid 20220227 Port 5
ms-be1075. F3. U1 cableid 20220272 Port 1

Change 894009 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] install_server: use newer partman setup for new ms backends

https://gerrit.wikimedia.org/r/894009

Change 894009 merged by MVernon:

[operations/puppet@production] install_server: use newer partman setup for new ms backends

https://gerrit.wikimedia.org/r/894009

Change 895141 had a related patch set uploaded (by MVernon; author: MVernon):

[operations/puppet@production] hiera: use a regex to specify new-style storage hosts

https://gerrit.wikimedia.org/r/895141

Change 895141 merged by MVernon:

[operations/puppet@production] hiera: use a regex to specify new-style storage hosts

https://gerrit.wikimedia.org/r/895141

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ms-be1072.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ms-be1073.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ms-be1074.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ms-be1072.eqiad.wmnet with OS bullseye completed:

  • ms-be1072 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304190029_pt1979_2389940_ms-be1072.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage was started by pt1979@cumin2002 for host ms-be1075.eqiad.wmnet with OS bullseye

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ms-be1074.eqiad.wmnet with OS bullseye completed:

  • ms-be1074 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304190039_pt1979_2396435_ms-be1074.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ms-be1073.eqiad.wmnet with OS bullseye completed:

  • ms-be1073 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304190037_pt1979_2394809_ms-be1073.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully

Cookbook cookbooks.sre.hosts.reimage started by pt1979@cumin2002 for host ms-be1075.eqiad.wmnet with OS bullseye completed:

  • ms-be1075 (PASS)
    • Removed from Puppet and PuppetDB if present
    • Deleted any existing Puppet certificate
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via IPMI
    • Host up (Debian installer)
    • Host up (new fresh bullseye OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202304190122_pt1979_2427942_ms-be1075.out
    • Checked BIOS boot parameters are back to normal
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB
    • Updated Netbox status planned -> active
    • The sre.puppet.sync-netbox-hiera cookbook was run successfully
    • Cleared switch DHCP cache and MAC table for the host IP and MAC (row E/F)
Papaul subscribed.

This is complete

MatthewVernon mentioned this in Unknown Object (Task).Jul 13 2023, 2:57 PM
MatthewVernon mentioned this in Unknown Object (Task).