Page MenuHomePhabricator

(Need By: TBD) rack/setup/install cp501[3-6]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of cp501[3-6]

Hostname / Racking / Installation Details

Hostnames: cp5013, cp5014, cp5015, cp5016
Racking Proposal: Same split as rest of cluster: odd-numbered hosts in rack 603 and even-numbered hosts in rack 604
Networking/Subnet/VLAN/IP: Onboard 1GbE disabled, use 1x 10G port from add-in card connected to top-of-rack switch, to vlan private1-eqsin (520)
Partitioning/Raid: Partman has a recipe for this config already, but we will have to add a new line to netboot.cfg for the two different kinds of hardware now in eqsin
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

cp5013:

  • - receive in system on procurement task T275801 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm). https://gerrit.wikimedia.org/r/683425
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cp5014:

  • - receive in system on procurement task T275801 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm). https://gerrit.wikimedia.org/r/683425
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cp5015:

  • - receive in system on procurement task T275801 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm). https://gerrit.wikimedia.org/r/683425
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cp5016:

  • - receive in system on procurement task T275801 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm). https://gerrit.wikimedia.org/r/683425
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedBBlack
ResolvedRobH

Event Timeline

RobH mentioned this in Unknown Object (Task).Mar 22 2021, 8:43 PM
RobH added a parent task: Unknown Object (Task).
RobH added a parent task: Unknown Object (Task).Mar 30 2021, 6:12 PM

Change 683026 had a related patch set uploaded (by BBlack; author: BBlack):

[operations/puppet@production] Puppetize cp501[3456]

https://gerrit.wikimedia.org/r/683026

Change 683425 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] cp501[3-6] base install params

https://gerrit.wikimedia.org/r/683425

Change 683425 merged by RobH:

[operations/puppet@production] cp501[3-6] base install params

https://gerrit.wikimedia.org/r/683425

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['cp5013.eqsin.wmnet', 'cp5014.eqsin.wmnet', 'cp5015.eqsin.wmnet', 'cp5016.eqsin.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202104282035_robh_16679.log.

Completed auto-reimage of hosts:

['cp5014.eqsin.wmnet', 'cp5015.eqsin.wmnet', 'cp5013.eqsin.wmnet', 'cp5016.eqsin.wmnet']

Of which those FAILED:

['cp5014.eqsin.wmnet', 'cp5015.eqsin.wmnet', 'cp5013.eqsin.wmnet', 'cp5016.eqsin.wmnet']

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

cp5013.eqsin.wmnet

The log can be found in /var/log/wmf-auto-reimage/202104282121_robh_24148_cp5013_eqsin_wmnet.log.

Completed auto-reimage of hosts:

['cp5013.eqsin.wmnet']

Of which those FAILED:

['cp5013.eqsin.wmnet']

Change 683438 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] updating cp501[3456] role

https://gerrit.wikimedia.org/r/683438

Change 683438 merged by RobH:

[operations/puppet@production] updating cp501[3456] role

https://gerrit.wikimedia.org/r/683438

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

cp5013.eqsin.wmnet

The log can be found in /var/log/wmf-auto-reimage/202104282152_robh_30911_cp5013_eqsin_wmnet.log.

Completed auto-reimage of hosts:

['cp5013.eqsin.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['cp5014.eqsin.wmnet', 'cp5015.eqsin.wmnet', 'cp5016.eqsin.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202104282308_robh_14085.log.

Completed auto-reimage of hosts:

['cp5014.eqsin.wmnet', 'cp5015.eqsin.wmnet', 'cp5016.eqsin.wmnet']

and were ALL successful.

RobH updated the task description. (Show Details)

These are all set to staged with the insetup_noferm role applied.

Note - https://gerrit.wikimedia.org/r/c/operations/puppet/+/683026 has the production roles and config, but we'll need to reimage them into this rather than just applying it, in order to get the nvme storage and partman set up consistently.

Change 683026 merged by BBlack:

[operations/puppet@production] Puppetize cp501[3456]

https://gerrit.wikimedia.org/r/683026

Change 691170 had a related patch set uploaded (by BBlack; author: BBlack):

[operations/puppet@production] Add missing cache::nodes for cp501[3456]

https://gerrit.wikimedia.org/r/691170

Change 691170 merged by BBlack:

[operations/puppet@production] Add missing cache::nodes for cp501[3456]

https://gerrit.wikimedia.org/r/691170