Page MenuHomePhabricator

(Need By: ASAP) rack/setup/install ms-be106[0-3]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of ms-be106[0-3].

Getting at least one host installed and ready for service has "need by" ASAP

Hostname / Racking / Installation Details

This is replacing ms-be[1019-1026] and also counting as an expansion. The footprint has reduced due to using the R740xd2.

Hostnames: ms-be106[0-3]
Racking Proposal: One host per row. Better not to share racks with existing ms-be hosts (that we are not refreshing), but if we can't avoid it that's fine too.
Networking/Subnet/VLAN/IP: 10G private vlan
Partitioning/Raid: existing partman recipe as other ms-be hsots
OS Distro: Stretch

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

ms-be1060:

  • - receive in system on procurement task T264140 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

ms-be1061:

  • - receive in system on procurement task T264140 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

ms-be1062:

  • - receive in system on procurement task T264140 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

ms-be1063:

  • - receive in system on procurement task T264140 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH added a parent task: Unknown Object (Task).Oct 8 2020, 9:41 PM
wiki_willy renamed this task from (Need By: TBD) rack/setup/install ms-be106[0-3] to (Need By: ASAP) rack/setup/install ms-be106[0-3].Nov 2 2020, 11:03 PM

Change 641817 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Add dhcp file and site.pp entry for new ms-be106[0-3]

https://gerrit.wikimedia.org/r/641817

Change 641817 merged by Cmjohnson:
[operations/puppet@production] Add dhcp file and site.pp entry for new ms-be106[0-3]

https://gerrit.wikimedia.org/r/641817

Cmjohnson updated the task description. (Show Details)
Cmjohnson added subscribers: RobH, Cmjohnson.

@RobH These are ready for you, the raid still needs setup but everything is done.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

ms-be1060.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011192137_robh_20532_ms-be1060_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['ms-be1060.eqiad.wmnet']

Of which those FAILED:

['ms-be1060.eqiad.wmnet']

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

ms-be1060.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011192137_robh_20736_ms-be1060_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['ms-be1060.eqiad.wmnet']

Of which those FAILED:

['ms-be1060.eqiad.wmnet']

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

ms-be1060.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011192231_robh_5976_ms-be1060_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['ms-be1060.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['ms-be1061.eqiad.wmnet', 'ms-be1062.eqiad.wmnet', 'ms-be1063.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202011192304_robh_562.log.

Completed auto-reimage of hosts:

['ms-be1063.eqiad.wmnet', 'ms-be1062.eqiad.wmnet']

Of which those FAILED:

['ms-be1061.eqiad.wmnet']

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

ms-be1061.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202011192336_robh_12645_ms-be1061_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['ms-be1061.eqiad.wmnet']

Of which those FAILED:

['ms-be1061.eqiad.wmnet']

Completed auto-reimage of hosts:

['ms-be1061.eqiad.wmnet']

Of which those FAILED:

['ms-be1061.eqiad.wmnet']

23:59:09 | ms-be1061.eqiad.wmnet | Unable to run wmf-auto-reimage-host: could not convert string to float: "Warning: Permanently added the ECDSA host key for IP address '2620:0:861:102:10:64:16:144' to the list of known hosts.\n1605830322"

yet the rest of the host is up and running puppet, so I am not sure what is up with that error?

RobH updated the task description. (Show Details)