Page MenuHomePhabricator

(Need By: TBD) rack/setup/install cloudvirt104[0-6].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of cloudvirt104[0-6].eqiad.wmnet

Hostname / Racking / Installation Details

Hostnames: cloudvirt104[0-6].eqiad.wmnet
Racking Proposal: Row D
Networking/Subnet/VLAN/IP:2 10Gb network connections per server. Ala cloudvirt103X
Partitioning/Raid: RAID10 for system drives. Hardware RAID is not required
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

cloudvirt1040.eqiad.wmnet: - THIS IS CURRENTLY THE SEED SERVER WITH THE INTEL NIC

  • - receive in system on procurement task T271236 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - SWAP INTEL NIC FOR BROADCOM
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cloudvirt1041.eqiad.wmnet:

  • - receive in system on procurement task T271236 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cloudvirt1042.eqiad.wmnet:

  • - receive in system on procurement task T271236 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cloudvirt1043.eqiad.wmnet:

  • - receive in system on procurement task T271236 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cloudvirt1044.eqiad.wmnet:

  • - receive in system on procurement task T271236 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cloudvirt1045.eqiad.wmnet:

  • - receive in system on procurement task T271236 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cloudvirt1046.eqiad.wmnet:

  • - this had directions to be the seed server, but a normal order was racked (can tell due to it not having intel NIC)
  • - receive in system on procurement task T271236 & in coupa (see note above)
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
    • end on-site specific steps
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Related Objects

StatusSubtypeAssignedTask
ResolvedRobH

Event Timeline

RobH created this task.
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH moved this task from Backlog to Racking / Decom on the cloud-services-team (Hardware) board.
RobH added a parent task: Unknown Object (Task).
RobH mentioned this in Unknown Object (Task).
RobH unsubscribed.

name rack_name position Port Cable ID
cloudvirt1040 D5 22 22,34 5358 ,5359
cloudvirt1041 D5 23 24,32 5360 ,5361
cloudvirt1042 D5 24 25,33 5362 ,5363
cloudvirt1043 D5 25 26,30 5364 ,5365
cloudvirt1044 D5 26 27,31 5366 ,5367
cloudvirt1045 D5 27 28,29 5368 ,5369
cloudvirt1046 D5 28 4,46 5370 ,5371

Cmjohnson added subscribers: RobH, Cmjohnson.

@RobH assigning this to you, 1040-1045 are ready for installs. I set up both ports in netbox. Since we're waiting on a new nic card for 1046 please reassign to John after (assuming no issues that need me to get involved). Thanks!

When I went to go install these, it seems that cloudvirt1040 is the one iwth the Intel nics, which means it was the seed server. The racking checklist was not followed, as cloudvirt1046 should have been the seed server. Hopefully the items were not received incorrectly in Coupa, as the packing slip service tags for the 6 purchased hosts need to have been inputted into coupa for the normal PO.

When the NIC arrives, it needs to go into cloudvirt1040.

Change 676455 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] cloudvirt104[0-6] setup info

https://gerrit.wikimedia.org/r/676455

Change 676455 merged by RobH:

[operations/puppet@production] cloudvirt104[0-6] setup info

https://gerrit.wikimedia.org/r/676455

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1041.eqiad.wmnet', 'cloudvirt1042.eqiad.wmnet', 'cloudvirt1043.eqiad.wmnet', 'cloudvirt1044.eqiad.wmnet', 'cloudvirt1045.eqiad.wmnet', 'cloudvirt1046.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202104012020_robh_12283.log.

So the entire lot failed to hit the dhcp server, the network has NOT been setup on cloudsw1-d5-eqiad for any of these hosts.

It appears it was setup in netbox, but as this switch is not tied to netbox, was forgotten about in manual update?

I've manually added port descriptions and the primary interface to vlan-cloud-hosts1-eqiad.

The secondary interface is not yet added to anything, as the command isn't just set interface-range command like normal and I don't recall what it is. It has the hostname set to the port though.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1041.eqiad.wmnet', 'cloudvirt1042.eqiad.wmnet', 'cloudvirt1043.eqiad.wmnet', 'cloudvirt1044.eqiad.wmnet', 'cloudvirt1045.eqiad.wmnet', 'cloudvirt1046.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202104082131_robh_9770.log.

Completed auto-reimage of hosts:

['cloudvirt1041.eqiad.wmnet', 'cloudvirt1042.eqiad.wmnet', 'cloudvirt1043.eqiad.wmnet', 'cloudvirt1044.eqiad.wmnet', 'cloudvirt1045.eqiad.wmnet', 'cloudvirt1046.eqiad.wmnet']

and were ALL successful.

I've emailed our Dell rep to determine where the NIC is for the seed server, cloudvirt1040. Once I have that info, I'll reassign this back to John for followup and installation of the NIC.

Please don't forget to update the switches by running Homer when you update Netbox, otherwise there are outstanding changes and they alert. I pushed the one bellow.

Changes for 1 devices: ['cloudsw1-d5-eqiad.mgmt.eqiad.wmnet']

[edit interfaces xe-0/0/26]
-   description "cloudvirt1043 {#5364}";
+   description "cloudvirt1043 {#0011}";
[edit interfaces xe-0/0/33]
-   description "cloudvirt1042 {#5363}";
+   description "cloudvirt1042 {#0010}";
RobH removed RobH as the assignee of this task.Apr 15 2021, 7:18 PM

Dell forgot to send out the NIC for the seed server, this was updated two days ago and supposedly its going to come soon.

I've advised them to please let me know the tracking info ASAP, so I can open and inbound shipment ticket and prevent further delays.

@RobH Swapped nic card handing back over for imaging

Change 681212 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] cloudvirt1040 mac update

https://gerrit.wikimedia.org/r/681212

Change 681212 merged by RobH:

[operations/puppet@production] cloudvirt1040 mac update

https://gerrit.wikimedia.org/r/681212

RobH updated the task description. (Show Details)
RobH removed subscribers: Cmjohnson, Jclark-ctr.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

cloudvirt1040.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202104192237_robh_5536_cloudvirt1040_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['cloudvirt1040.eqiad.wmnet']

and were ALL successful.

RobH updated the task description. (Show Details)

Change 681413 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Make cloudvirt1040-46 into hypervisors

https://gerrit.wikimedia.org/r/681413

Change 681413 merged by Andrew Bogott:

[operations/puppet@production] Make cloudvirt1040-46 into hypervisors

https://gerrit.wikimedia.org/r/681413

Change 681415 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Add host hiera for new cloudvirts

https://gerrit.wikimedia.org/r/681415

Change 681415 merged by Andrew Bogott:

[operations/puppet@production] Add host hiera for new cloudvirts

https://gerrit.wikimedia.org/r/681415

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1040.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202104281341_andrew_23063.log.

Completed auto-reimage of hosts:

['cloudvirt1040.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by andrew on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1041.eqiad.wmnet', 'cloudvirt1042.eqiad.wmnet', 'cloudvirt1043.eqiad.wmnet', 'cloudvirt1044.eqiad.wmnet', 'cloudvirt1045.eqiad.wmnet', 'cloudvirt1046.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202104281603_andrew_19325.log.

Completed auto-reimage of hosts:

['cloudvirt1042.eqiad.wmnet', 'cloudvirt1041.eqiad.wmnet', 'cloudvirt1045.eqiad.wmnet', 'cloudvirt1044.eqiad.wmnet', 'cloudvirt1043.eqiad.wmnet', 'cloudvirt1046.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-cloud) [2021-04-28T18:11:08Z] <andrewbogott> adding cloudvirt1040, 1041 and 1042 to the 'ceph' host aggregate -- T275081

These servers are now installed and running; 1040, 1041 and 1042 are now active in the 'ceph' host aggregate, the others are held back as spares for now.

Thank you all!