Page MenuHomePhabricator

rack/setup/install labtestvirt2003
Closed, ResolvedPublic

Description

This task will track the racking and setup of the new labtestvirt2003 server for codfw. This was ordered on T163030

Racking Proposal: This host has to reside in the labs-hosts1-b-codfw subnet, so it has to go in row B. There are labs-hosts1 subnets reserved for the other rows, but they aren't in use yet, as we don't support the cloud/labs virt hosts across rows quite yet. So rack this in row B, in whatever rack has 1Gbps networking and the most room/space/power availability (ie: not in frack or 10G racks.)

labtestvirt2003:

  • - receive in system on procurement task T163030
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - production dns entries added
  • - network port setup (description, enable, vlan)
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

Please note there was confusion on hostname use, since racktables didn't show labtestvirt2002 in use, when a host already existed. Task T166598 was created to correct that. The host being setup on this task should now be named labtestvirt2003.

Event Timeline

@chasemp what partman recipe do you want to use for the server? We have :

  • raid10-gpt.cfg
  • raid10-gpt-srv-ext4.cfg
  • raid10-gpt-srv-lvm-ext4.cfg

@RobH I have already mgmt and productions DNS for labtestvirt2002 shouldn't this be labtestvirt2003

It seems that there is confusion, due to the fact that racktables shows two labtestvirt2001 systems.

I went ahead and connected to the mgmt dns for the existing labtestvirt2002, and it points to this system: https://racktables.wikimedia.org/index.php?page=object&object_id=1317

WMF3810 is labtestvirt2002. (There will be a new task to correct that one shortly.)

The system being setup on this task T166237, should be renamed to labtestvirt2003, since the other system already exists, and is online as labtestvirt2002.

RobH renamed this task from rack/setup/install labtestvirt2002 to rack/setup/install labtestvirt2003.May 30 2017, 5:55 PM
RobH updated the task description. (Show Details)

Change 356302 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt and production DNS for labtestvirt2003

https://gerrit.wikimedia.org/r/356302

Change 356302 merged by Dzahn:
[operations/dns@master] DNS: Add mgmt and production DNS for labtestvirt2003

https://gerrit.wikimedia.org/r/356302

@chasemp the others lab servers in the DHCP file are pointing to the Trusty install do you want to install Trusty on labtestvirt2003 or Jessie ?

@chasemp the others lab servers in the DHCP file are pointing to the Trusty install do you want to install Trusty on labtestvirt2003 or Jessie ?

@Papaul, yes the openstack infrastructure is still on Trusty atm thanks

It seems that there is confusion, due to the fact that racktables shows two labtestvirt2001 systems.

I went ahead and connected to the mgmt dns for the existing labtestvirt2002, and it points to this system: https://racktables.wikimedia.org/index.php?page=object&object_id=1317

WMF3810 is labtestvirt2002. (There will be a new task to correct that one shortly.)

The system being setup on this task T166237, should be renamed to labtestvirt2003, since the other system already exists, and is online as labtestvirt2002.

thanks @RobH, sorry for the confusion. At some point I think @Andrew renamed while we debugged the scheduler and we missed racktables.

@chasemp what partman recipe do you want to use for the server? We have :

raid10-gpt-srv-lvm-ext4.cfg

seems fine, thanks papaul

Change 356617 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Add partman recipe and DHCP entries for labtestvirt2003

https://gerrit.wikimedia.org/r/356617

Change 356617 merged by RobH:
[operations/puppet@production] Add partman recipe and DHCP entries for labtestvirt2003

https://gerrit.wikimedia.org/r/356617

RobH added a subscriber: Papaul.

Change 356625 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setting labtestvirt2003 into site.pp

https://gerrit.wikimedia.org/r/356625

Change 356625 merged by RobH:
[operations/puppet@production] setting labtestvirt2003 into site.pp

https://gerrit.wikimedia.org/r/356625

Tried to add into site.pp but seems there is an error condition due to kernels in use (system was just freshly installed.)

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: nova-compute not installed on buggy kernels. On 3.13 series kernels, instance suspension causes complete system lockup. Try installing linux-image-generic-lts-xenial at /etc/puppet/modules/openstack/manifests/nova/compute.pp:123 on node labtestvirt2003.codfw.wmnet
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

So it kicked back that failure, and though it now has the puppet keys and such installed, has failures left over from that failed install.

@chasemp: if you guys want to re-enable the site.pp settings and debug, I'm kickign this over to you. If you prefer I reinstall it again, just kick it back to me!

Labtestvirt2003 is installed now, and properly attached to rabbitmq and the nova controller.

  • It is not currently in the scheduler pool (but easy to add)
  • There's one issue with the prometheus setup, which is T166843
Andrew updated the task description. (Show Details)