This task will track the racking, setup, and OS installation of ml-serve100[1-4].
Please note these are GPU capable chassis/servers ordered via T266482, with the GPU cards being ordered on T266516 & a future task. Only 1 GPU was ordered initially, as the new AMD Radeon 5700 XT has NOT been test fitted and power tested in the chassis. Once a single GPU has passed testing, we can order the remaining GPUs.
We went conservatively on this ordering cadence (one to test), as GPU cards all are non-returnable once opened, and even returned unopened will incur a restocking fee.
== Hostname / Racking / Installation Details ==
**Hostnames:** ml-serve100[1-4]
**Racking Proposal:** 4 hosts will be in the same cluster, so differing racks at minimum. row diversity is preferred, but understood if a single row is at capacity then 2 hosts may need to share a row.
**Networking/Subnet/VLAN/IP:** 1G, internal1 vlan (systems have 1G/10G nics, only need 1G at this time)
**Partitioning/Raid:** match an-worker[1096-1101]
**OS Distro:** Buster
== Per host setup checklist ==
Each host should have its own setup checklist copied and pasted into the list below.
ml-serve1001:
[X] - receive in system on #procurement task T266482 & in coupa
[X] - receive in GPU card on #procurement task T266516 & in coupa
[x] - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
[x] - bios/drac/serial setup/testing
[x] - mgmt dns entries added for both asset tag and hostname
[x] - network port setup (description, enable, vlan)
** end on-site specific steps
[x] - production dns entries added
[x] - operations/puppet update (install_server at minimum, other files if possible)
[x] - OS installation
[x] - puppet accept/initial run (with role:spare)
[x] - host state in netbox set to staged
ml-serve1002:
[X] - receive in system on #procurement task T266482 & in coupa
[x] - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
[x] - bios/drac/serial setup/testing
[x] - mgmt dns entries added for both asset tag and hostname
[x] - network port setup (description, enable, vlan)
** end on-site specific steps
[x] - production dns entries added
[x] - operations/puppet update (install_server at minimum, other files if possible)
[x] - OS installation
[x] - puppet accept/initial run (with role:spare)
[x] - host state in netbox set to staged
ml-serve1003:
[X] - receive in system on #procurement task T266482 & in coupa
[x] - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
[x] - bios/drac/serial setup/testing
[x] - mgmt dns entries added for both asset tag and hostname
[x] - network port setup (description, enable, vlan)
** end on-site specific steps
[x] - production dns entries added
[x] - operations/puppet update (install_server at minimum, other files if possible)
[x] - OS installation
[x] - puppet accept/initial run (with role:spare)
[x] - host state in netbox set to staged
ml-serve1004:
[X] - receive in system on #procurement task T266482 & in coupa
[x] - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
[x] - bios/drac/serial setup/testing
[x] - mgmt dns entries added for both asset tag and hostname
[x] - network port setup (description, enable, vlan)
** end on-site specific steps
[x] - production dns entries added
[x] - operations/puppet update (install_server at minimum, other files if possible)
[] - OS installation
[] - puppet accept/initial run (with role:spare)
[] - host state in netbox set to staged
**Once the system(s) above have had all checkbox steps completed, this task can be resolved.**