Page MenuHomePhabricator

rack/setup/install cloudvirt10[25-30].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of 6 new cloudvirt hosts for eqiad.

Please note that 2 of these hosts are replacing labvirt1010 and labvirt1011, which are due back to Farnam in December 2018.

Racking Proposal: Cloudvirts are restricted to row B with the other cloudvirts. These are 1G hosts (even though they have combined 1g/10g nics, they will be using 1G for now.)

cloudvirt1025:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Jessie, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1026:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Jessie, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1027:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Jessie, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1028:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Jessie, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1029:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Stretch, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt1030:

  • - receive in system on procurement task T201352
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan) for both primary 1G and secondary 1G interfaces, as cloudvirts use both!
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation -- Debian Stretch, use cloudvirt1023 as a template
  • - puppet accept/initial run
  • - handoff for service implementation

Event Timeline

RobH triaged this task as High priority.Nov 15 2018, 6:12 PM
RobH created this task.
RobH updated the task description. (Show Details)

@RobH, I'm always happy for you to image these things, but if you wind up with too much to do @aborrero has offered to do the OS installs.

Change 480786 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns cloudvirt1025-30

https://gerrit.wikimedia.org/r/480786

I know that these say 10G but all 4 nics are standard rj45....granted 2 say 10G and 2 say 1G...kind of confusing. I plug the ethernet cable into the 1G ports which are the 3rd and 4th option in device settings.

Change 480786 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns cloudvirt1025-30

https://gerrit.wikimedia.org/r/480786

Change 480812 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding dhcpd/netboot.cfg entries cloudvirt1025-30

https://gerrit.wikimedia.org/r/480812

@RobH these are ready for installs I added the mac address and netboot.cfg I did not merge the changes, please review.

I used the mac address for nic 1-3 since that is the first 1G nic on the server. The pxe boot order has been set to that as well.

@RobH also, the 2nd ethernet port was placed in cloud-virt-instance-trunk

Change 480812 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Adding dhcpd/netboot.cfg entries cloudvirt1025-30

https://gerrit.wikimedia.org/r/480812

Change 480947 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] site.pp: add role for cloudvirt1030

https://gerrit.wikimedia.org/r/480947

Change 480949 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] wmnet: introduce new cloudvirt10XX.eqiad.wmnet FQDNs (25-30)

https://gerrit.wikimedia.org/r/480949

Change 480949 merged by Arturo Borrero Gonzalez:
[operations/dns@master] wmnet: introduce new cloudvirt10XX.eqiad.wmnet FQDNs (25-30)

https://gerrit.wikimedia.org/r/480949

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1030.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201812201208_aborrero_22804.log.

Completed auto-reimage of hosts:

['cloudvirt1030.eqiad.wmnet']

and were ALL successful.

Change 480947 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] site.pp: add role for cloudvirt1030

https://gerrit.wikimedia.org/r/480947

Change 480953 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: introduce key instance_dev for cloudvirt1030

https://gerrit.wikimedia.org/r/480953

Change 480953 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: introduce key instance_dev for cloudvirt1030

https://gerrit.wikimedia.org/r/480953

aborrero updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2018-12-20T12:53:25Z] <arturo> T209616 installing cloudvirt1030, icinga downtime for 1 day

Summary of what I did today:

  • added production FQDNs to all new servers
  • tried imaging cloudvirt1030.eqiad.wmnet with Stretch. Succeeded. I only had to press 'yes' a couple of times in the installer.
  • our puppet service codebase (openstack) doesn't support well a fresh stretch install.
  • we are discussing further steps in T212302: CloudVPS: upgrade: jessie -> stretch & mitaka -> newton

Change 480989 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirts: install cloudvirt1025 as jessie

https://gerrit.wikimedia.org/r/480989

Change 480989 merged by Andrew Bogott:
[operations/puppet@production] cloudvirts: install cloudvirt1025 as jessie

https://gerrit.wikimedia.org/r/480989

Change 480996 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] new cloudvirts: add initial hiera config

https://gerrit.wikimedia.org/r/480996

Change 480997 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Make cloudvirt1025 a nova compute node

https://gerrit.wikimedia.org/r/480997

Change 480996 merged by Andrew Bogott:
[operations/puppet@production] new cloudvirts: add initial hiera config

https://gerrit.wikimedia.org/r/480996

Change 480997 merged by Andrew Bogott:
[operations/puppet@production] Make cloudvirt1025 a nova compute node

https://gerrit.wikimedia.org/r/480997

Change 481001 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1025: use eth3 rather than (default) eth1 for VM communication

https://gerrit.wikimedia.org/r/481001

Change 481001 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1025: use eth3 rather than (default) eth1 for VM communication

https://gerrit.wikimedia.org/r/481001

Change 481009 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] specify eth3 for neutron for cloudvirt1025

https://gerrit.wikimedia.org/r/481009

Change 481009 merged by Andrew Bogott:
[operations/puppet@production] specify eth3 for neutron for cloudvirt1025

https://gerrit.wikimedia.org/r/481009

cloudvirt1025 is working properly. The others are stuck in limbo while Arturo and I figure out what to do about stretch vs. jessie.

Change 482022 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvirt1029: introduce it to the openstack eqiad1 deployment

https://gerrit.wikimedia.org/r/482022

Change 482022 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvirt1029: introduce it to the openstack eqiad1 deployment

https://gerrit.wikimedia.org/r/482022

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1029.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901031404_aborrero_97811.log.

Mentioned in SAL (#wikimedia-operations) [2019-01-03T14:05:27Z] <arturo> T209616 reimage cloudvirt1029 as debian stretch

Completed auto-reimage of hosts:

['cloudvirt1029.eqiad.wmnet']

and were ALL successful.

Change 482025 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvirt1029: introduce hiera overrides for new iface names

https://gerrit.wikimedia.org/r/482025

Change 482025 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvirt1029: introduce hiera overrides for new iface names

https://gerrit.wikimedia.org/r/482025

Change 482033 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Prepare new/empty cloudvirts for Stretch/Mitaka

https://gerrit.wikimedia.org/r/482033

Change 482033 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Prepare new/empty cloudvirts for Stretch/Mitaka

https://gerrit.wikimedia.org/r/482033

Change 482052 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: add roles for new cloudvirts

https://gerrit.wikimedia.org/r/482052

Change 482052 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: add roles for new cloudvirts

https://gerrit.wikimedia.org/r/482052

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

['cloudvirt1026.eqiad.wmnet', 'cloudvirt1027.eqiad.wmnet', 'cloudvirt1028.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901031600_aborrero_125531.log.

For the record, cloudvirt1027.eqiad.wmnet repotrs being a Dell PowerEdge R640 (instead of R630) and same for cloudvirt1026

Completed auto-reimage of hosts:

['cloudvirt1027.eqiad.wmnet', 'cloudvirt1028.eqiad.wmnet']

Of which those FAILED:

['cloudvirt1027.eqiad.wmnet', 'cloudvirt1028.eqiad.wmnet']

Change 482079 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Stretch cloudvirts: remove Stretch installer overrides

https://gerrit.wikimedia.org/r/482079

Change 482079 merged by Andrew Bogott:
[operations/puppet@production] Stretch cloudvirts: remove Stretch installer overrides

https://gerrit.wikimedia.org/r/482079

Change 482143 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Turn on alerting and add cloudvirt1027 and 1028 to the scheduler pool

https://gerrit.wikimedia.org/r/482143

Change 482143 merged by Andrew Bogott:
[operations/puppet@production] Turn on alerting and add cloudvirt1027 and 1028 to the scheduler pool

https://gerrit.wikimedia.org/r/482143

Andrew updated the task description. (Show Details)

thanks all!