Page MenuHomePhabricator

rack/setup/install cloudvirt10[25-30].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of 6 new cloudvirt hosts for eqiad.

Please note that 2 of these hosts are replacing labvirt1010 and labvirt1011, which are due back to Farnam in December 2018.

Racking Proposal: Cloudvirts are restricted to row B with the other cloudvirts. These are 1G hosts (even though they have combined 1g/10g nics, they will be using 1G for now.)

cloudvirt1025:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Jessie, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1026:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Jessie, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1027:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Jessie, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1028:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Jessie, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1029:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Stretch, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1030:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Stretch, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

Event Timeline

RobH triaged this task as High priority.Nov 15 2018, 6:12 PM
RobH created this task.
RobH updated the task description. (Show Details)
Andrew updated the task description. (Show Details)Nov 15 2018, 6:45 PM
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.Dec 12 2018, 11:32 PM

@RobH, I'm always happy for you to image these things, but if you wind up with too much to do @aborrero has offered to do the OS installs.

Change 480786 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns cloudvirt1025-30

https://gerrit.wikimedia.org/r/480786

I know that these say 10G but all 4 nics are standard rj45....granted 2 say 10G and 2 say 1G...kind of confusing. I plug the ethernet cable into the 1G ports which are the 3rd and 4th option in device settings.

Change 480786 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns cloudvirt1025-30

https://gerrit.wikimedia.org/r/480786

Cmjohnson updated the task description. (Show Details)Dec 19 2018, 6:47 PM

Change 480812 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding dhcpd/netboot.cfg entries cloudvirt1025-30

https://gerrit.wikimedia.org/r/480812

@RobH these are ready for installs I added the mac address and netboot.cfg I did not merge the changes, please review.

I used the mac address for nic 1-3 since that is the first 1G nic on the server. The pxe boot order has been set to that as well.

Cmjohnson reassigned this task from Cmjohnson to RobH.Dec 19 2018, 7:00 PM
Cmjohnson added a subscriber: Cmjohnson.

@RobH also, the 2nd ethernet port was placed in cloud-virt-instance-trunk

Change 480812 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Adding dhcpd/netboot.cfg entries cloudvirt1025-30

https://gerrit.wikimedia.org/r/480812

Change 480947 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] site.pp: add role for cloudvirt1030

https://gerrit.wikimedia.org/r/480947

Change 480949 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] wmnet: introduce new cloudvirt10XX.eqiad.wmnet FQDNs (25-30)

https://gerrit.wikimedia.org/r/480949

Change 480949 merged by Arturo Borrero Gonzalez:
[operations/dns@master] wmnet: introduce new cloudvirt10XX.eqiad.wmnet FQDNs (25-30)

https://gerrit.wikimedia.org/r/480949

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1030.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201812201208_aborrero_22804.log.

Completed auto-reimage of hosts:

['cloudvirt1030.eqiad.wmnet']

and were ALL successful.

Change 480947 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] site.pp: add role for cloudvirt1030

https://gerrit.wikimedia.org/r/480947

Change 480953 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: introduce key instance_dev for cloudvirt1030

https://gerrit.wikimedia.org/r/480953

Change 480953 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: introduce key instance_dev for cloudvirt1030

https://gerrit.wikimedia.org/r/480953

aborrero updated the task description. (Show Details)Dec 20 2018, 12:48 PM
aborrero updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2018-12-20T12:53:25Z] <arturo> T209616 installing cloudvirt1030, icinga downtime for 1 day

Summary of what I did today:

  • added production FQDNs to all new servers
  • tried imaging cloudvirt1030.eqiad.wmnet with Stretch. Succeeded. I only had to press 'yes' a couple of times in the installer.
  • our puppet service codebase (openstack) doesn't support well a fresh stretch install.
  • we are discussing further steps in T212302: CloudVPS: upgrade: jessie -> stretch & mitaka -> newton

Change 480989 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirts: install cloudvirt1025 as jessie

https://gerrit.wikimedia.org/r/480989

Change 480989 merged by Andrew Bogott:
[operations/puppet@production] cloudvirts: install cloudvirt1025 as jessie

https://gerrit.wikimedia.org/r/480989

RobH reassigned this task from RobH to Andrew.Dec 20 2018, 4:46 PM

Change 480996 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] new cloudvirts: add initial hiera config

https://gerrit.wikimedia.org/r/480996

Change 480997 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Make cloudvirt1025 a nova compute node

https://gerrit.wikimedia.org/r/480997

Change 480996 merged by Andrew Bogott:
[operations/puppet@production] new cloudvirts: add initial hiera config

https://gerrit.wikimedia.org/r/480996

Change 480997 merged by Andrew Bogott:
[operations/puppet@production] Make cloudvirt1025 a nova compute node

https://gerrit.wikimedia.org/r/480997

Change 481001 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1025: use eth3 rather than (default) eth1 for VM communication

https://gerrit.wikimedia.org/r/481001

Change 481001 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1025: use eth3 rather than (default) eth1 for VM communication

https://gerrit.wikimedia.org/r/481001

Change 481009 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] specify eth3 for neutron for cloudvirt1025

https://gerrit.wikimedia.org/r/481009

Change 481009 merged by Andrew Bogott:
[operations/puppet@production] specify eth3 for neutron for cloudvirt1025

https://gerrit.wikimedia.org/r/481009

Andrew updated the task description. (Show Details)Dec 20 2018, 6:17 PM

cloudvirt1025 is working properly. The others are stuck in limbo while Arturo and I figure out what to do about stretch vs. jessie.

RobH removed a subscriber: RobH.

Change 482022 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvirt1029: introduce it to the openstack eqiad1 deployment

https://gerrit.wikimedia.org/r/482022

Change 482022 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvirt1029: introduce it to the openstack eqiad1 deployment

https://gerrit.wikimedia.org/r/482022

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1029.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901031404_aborrero_97811.log.

Mentioned in SAL (#wikimedia-operations) [2019-01-03T14:05:27Z] <arturo> T209616 reimage cloudvirt1029 as debian stretch

aborrero updated the task description. (Show Details)Jan 3 2019, 2:07 PM

Completed auto-reimage of hosts:

['cloudvirt1029.eqiad.wmnet']

and were ALL successful.

Change 482025 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvirt1029: introduce hiera overrides for new iface names

https://gerrit.wikimedia.org/r/482025

Change 482025 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvirt1029: introduce hiera overrides for new iface names

https://gerrit.wikimedia.org/r/482025

aborrero updated the task description. (Show Details)Jan 3 2019, 2:31 PM

Change 482033 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Prepare new/empty cloudvirts for Stretch/Mitaka

https://gerrit.wikimedia.org/r/482033

Change 482033 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Prepare new/empty cloudvirts for Stretch/Mitaka

https://gerrit.wikimedia.org/r/482033

Change 482052 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: add roles for new cloudvirts

https://gerrit.wikimedia.org/r/482052

Change 482052 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: add roles for new cloudvirts

https://gerrit.wikimedia.org/r/482052

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1026.eqiad.wmnet', 'cloudvirt1027.eqiad.wmnet', 'cloudvirt1028.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901031600_aborrero_125531.log.

aborrero added a comment.EditedJan 3 2019, 4:27 PM

For the record, cloudvirt1027.eqiad.wmnet repotrs being a Dell PowerEdge R640 (instead of R630) and same for cloudvirt1026

aborrero updated the task description. (Show Details)Jan 3 2019, 4:36 PM

Completed auto-reimage of hosts:

['cloudvirt1027.eqiad.wmnet', 'cloudvirt1028.eqiad.wmnet']

Of which those FAILED:

['cloudvirt1027.eqiad.wmnet', 'cloudvirt1028.eqiad.wmnet']

Change 482079 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Stretch cloudvirts: remove Stretch installer overrides

https://gerrit.wikimedia.org/r/482079

Change 482079 merged by Andrew Bogott:
[operations/puppet@production] Stretch cloudvirts: remove Stretch installer overrides

https://gerrit.wikimedia.org/r/482079

Change 482143 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Turn on alerting and add cloudvirt1027 and 1028 to the scheduler pool

https://gerrit.wikimedia.org/r/482143

Change 482143 merged by Andrew Bogott:
[operations/puppet@production] Turn on alerting and add cloudvirt1027 and 1028 to the scheduler pool

https://gerrit.wikimedia.org/r/482143

Andrew closed this task as Resolved.Jan 3 2019, 8:44 PM
Andrew updated the task description. (Show Details)

thanks all!