Page MenuHomePhabricator

(Need By: TBD) rack/setup/install an-druid100[345]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of an-druid100[345]

Hostname / Racking / Installation Details

Hostnames: an-druid100[345]
Racking Proposal: Avoid sharing with an-druid100[12] (racks A5 & C3) as its the same cluster. Attempt to avoid racking with druid100[4-8] (A6,B6,D6,B3,D5) if possible so they could technically combine pools, but not as important as avoiding A5 and C3 an-druid rack sharing.
Networking/Subnet/VLAN/IP: 1G, analytics vlan
Partitioning/Raid: match an-druid100[12]
OS Distro: buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

an-druid1003:

  • - receive in system on procurement task T271146 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - bios/idrac firmware updated
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

an-druid1004:

  • - receive in system on procurement task T271146 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - bios/idrac firmware updated
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

an-druid1005:

  • - receive in system on procurement task T271146 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - bios/idrac firmware updated
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH added a parent task: Unknown Object (Task).
RobH mentioned this in Unknown Object (Task).Feb 8 2021, 5:55 PM
elukey subscribed.
elukey unsubscribed.
elukey subscribed.
Jclark-ctr updated the task description. (Show Details)
Jclark-ctr subscribed.

racked and cabled host. handing over to chris for configuration
an-druid1003 A3 u15 p14 id1858
an-druid1004 B1 u13 p24 id3550
an-druid1005 D3 U21 P20 ID3685

@razzi these nodes will replace druid100[1-3] (that are out of warranty), so once done we'll have to create a task to swap the hosts :)

These just need the on-site setup. Planning on doing this tomorrow

@Cmjohnson these needs to be in the Analytics VLAN (double checking since I see "internal VLAN" in the task description)

Cmjohnson updated the task description. (Show Details)
Cmjohnson added subscribers: RobH, Cmjohnson.

onsite work completed, assigning to @RobH for installs

The new nodes are in the private vlan, we'd need the in Analytics one (as described above) :)

Change 669969 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] an-druid100[345] updates

https://gerrit.wikimedia.org/r/669969

Change 669969 merged by RobH:
[operations/puppet@production] an-druid100[345] updates

https://gerrit.wikimedia.org/r/669969

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

an-druid1003.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202103082025_robh_6601_an-druid1003_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-druid1003.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

an-druid1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202103082104_robh_13718_an-druid1005_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-druid1005.eqiad.wmnet']

Of which those FAILED:

['an-druid1005.eqiad.wmnet']

Change 669995 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] updating an-druid1005

https://gerrit.wikimedia.org/r/669995

Change 669995 merged by RobH:
[operations/puppet@production] updating an-druid1005

https://gerrit.wikimedia.org/r/669995

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

an-druid1005.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202103082218_robh_26961_an-druid1005_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-druid1005.eqiad.wmnet']

and were ALL successful.

RobH updated the task description. (Show Details)

an-druid1004.mgmt.eqiad.wmnet is not responsive. Please double check the mgmt cable and settings. Once its online, I can continue with imaging.

@RobH try it now, the cable was unplugged.

Change 670591 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] an-druid1004 mac update

https://gerrit.wikimedia.org/r/670591

Change 670591 merged by RobH:
[operations/puppet@production] an-druid1004 mac update

https://gerrit.wikimedia.org/r/670591

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

an-druid1004.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202103102253_robh_2203_an-druid1004_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['an-druid1004.eqiad.wmnet']

and were ALL successful.

RobH updated the task description. (Show Details)