Page MenuHomePhabricator

rack/setup/install cloudcontrol2001-dev & cloudvirt200[123]-dev
Open, NormalPublic

Description

This task will cover the racking, setup, and installation of 4 new cloud-dev hosts purchased on T210781.

Hostname Proposal:

This should be confirmed by cloud-services-team!

T210781 states that the order for it covers: 1 x cloudcontrol2xxx-dev & 1-3 x cloudvirt2xxx-dev. There are NO cloudanything-dev in codfw at present, so these will have the following hostnames:

cloudcontrol2001-dev
cloudvirt2001-dev
cloudvirt2002-dev
cloudvirt2003-dev

Racking Proposal:
This should be confirmed by netops and/or cloud-services-team!

@RobH is making a number of assumptions here, and will need both netops and cloud-services-team to confirm things before @Papaul spends time racking the cloudvirt200[1-3]-dev hosts.

cloudcontrol2001-dev: cloudcontrol100[34] in eqiad are in the public vlan, since it appears they need to interact with both the cloudvirt hosts, and the rest of the world. So cloudcontrol2001-dev can be racked in any 1G rack in ANY row, since it will be placed in the public vlan. Don't rack in c1-codfw, since it has the system that will be counterpoint system.

cloudvirt200[1-3]-dev: cloudvirts have to be in a row that has both the cloud-hosts1 and cloud-instances[12] vlans for proper networking support. Row B in eqiad has those vlans on the switch and they are in use (while they aren't in use on other rows), so it seems row B is the row to put all cloudvirts on in codfw.

cloudcontrol2001-dev:

  • - receive in system on procurement task T210781
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location) (B1)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) (raid1.cfg)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt2001-dev:

  • - receive in system on procurement task T210781
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location) (B3)
  • - please ensure BOTH 1G interfaces are hooked up. cloudvirts use the first interface for the OS/host and the second interface for instance traffic.
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) (raid1.cfg)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt2002-dev:

  • - receive in system on procurement task T210781
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location) (B5)
  • - please ensure BOTH 1G interfaces are hooked up. cloudvirts use the first interface for the OS/host and the second interface for instance traffic.
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) (raid1.cfg)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

cloudvirt2003-dev:

  • - receive in system on procurement task T210781
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location) (B8)
  • - please ensure BOTH 1G interfaces are hooked up. cloudvirts use the first interface for the OS/host and the second interface for instance traffic.
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) (raid1.cfg)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation

Event Timeline

RobH created this task.Jan 23 2019, 12:08 AM
RobH triaged this task as Normal priority.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 23 2019, 12:08 AM
RobH added a subscriber: ayounsi.Jan 23 2019, 12:14 AM

Ok, just confirmed with @ayounsi that row B is the row for cloudvirt hosts!

@Papaul: Unless the cloud-services-team states differently, I think you can move ahead on racking with what I put in the task description above!

Ok, just confirmed with @ayounsi that row B is the row for cloudvirt hosts!

@Papaul: Unless the cloud-services-team states differently, I think you can move ahead on racking with what I put in the task description above!

The racking plan from @RobH looks right to me. cloudcontrol2001-dev will replace labtestcontrol2001 which is currently in rack B5, but I think the new server can go anywhere other than C1 which is where labtestcontrol2003 (which will become cloudcontrol2002-dev when we reimage it) lives today.

RobH updated the task description. (Show Details)Jan 23 2019, 12:30 AM
Papaul claimed this task.Wed, Jan 23, 3:47 PM
Papaul updated the task description. (Show Details)Wed, Jan 23, 5:49 PM
Papaul updated the task description. (Show Details)Thu, Jan 24, 4:03 PM

Change 486391 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt DNS entries for cloudcontrol2001-dev and cloudvirt200[123]-dev

https://gerrit.wikimedia.org/r/486391

Change 486391 merged by Dzahn:
[operations/dns@master] DNS: Add mgmt DNS entries for cloudcontrol2001-dev and cloudvirt200[123]-dev

https://gerrit.wikimedia.org/r/486391

Papaul updated the task description. (Show Details)Fri, Jan 25, 4:06 AM

I find it really confusing that we are reusing numbering for these servers, even with the renaming for the new naming scheme.

I find it really confusing that we are reusing numbering for these servers, even with the renaming for the new naming scheme.

It doesn't bother me, but I think you should feel free to rename/renumber things as you see fit. As I understand it it's not a big deal for @Papaul to update labels.

I find it really confusing that we are reusing numbering for these servers, even with the renaming for the new naming scheme.

It doesn't bother me, but I think you should feel free to rename/renumber things as you see fit. As I understand it it's not a big deal for @Papaul to update labels.

I think I was able to identify and describe what was confusing me: T214499#4909226

@Andrew for all those new servers I am using for partman labvirt-ssd.cfg?

@Andrew can you also specify on this task in which VLAN eth1 needs to be for cloudvirt200[1-3]. Thanks

@Andrew can you also specify on this task in which VLAN eth1 needs to be for cloudvirt200[1-3]. Thanks

vlan 2105.

Mind that these new cloudvirts servers should be imaged using Debian Stretch, so eth1 might not be present (but eno50 or eno2 or whatever).

Change 486504 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add production DNS enties for cloudcontrol2001-dev and cloudvirt200[123]-dev

https://gerrit.wikimedia.org/r/486504

@Andrew for all those new servers I am using for partman labvirt-ssd.cfg?

It depends on what raid controller we have. That recipe expects to see one big raid10 configured in hardware; if that's possible then go ahead and use that partman recipe; if not then I'll dig out a better one.

@Andrew there is no raid controller on the new servers. They all have 2x200GB SSD's

@Andrew there is no raid controller on the new servers. They all have 2x200GB SSD's

ok -- let's just use partman/raid1.cfg then, for now at least. Thanks!

Papaul updated the task description. (Show Details)Fri, Jan 25, 8:41 PM

Change 486700 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] Add DHCP MAC addrese and partman for cloudcontrol2001-dev and cloudvirt200[123]-dev

https://gerrit.wikimedia.org/r/486700

Change 486700 merged by Dzahn:
[operations/puppet@production] DHCP/partman: add cloudcontrol2001-dev and cloudvirt200[123]-dev

https://gerrit.wikimedia.org/r/486700

Papaul updated the task description. (Show Details)Fri, Jan 25, 11:35 PM
Papaul updated the task description. (Show Details)Tue, Feb 5, 6:26 PM

Change 486504 merged by Arturo Borrero Gonzalez:
[operations/dns@master] DNS: Add production DNS enties for cloudcontrol2001-dev and cloudvirt200[123]-dev

https://gerrit.wikimedia.org/r/486504

Change 488496 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/puppet@production] DHCP: Fix fixed-address name

https://gerrit.wikimedia.org/r/488496

Change 488496 merged by Papaul:
[operations/puppet@production] DHCP: Fix fixed-address name

https://gerrit.wikimedia.org/r/488496

Papaul updated the task description. (Show Details)Wed, Feb 6, 5:26 PM
ayounsi removed a subscriber: ayounsi.Wed, Feb 6, 5:33 PM
Papaul updated the task description. (Show Details)Wed, Feb 6, 5:41 PM
Papaul updated the task description. (Show Details)Wed, Feb 6, 10:06 PM

second NIC configuration

cloudvirt2001-dev

Logical          Vlan          TAG     MAC         STP         Logical           Tagging 
interface        members               limit       state       interface flags  
ge-3/0/23.0                            294912                                     tagged     
                 cloud-instances2-b-codfw 2105 294912 Forwarding                  tagged

cloudvirt2002-dev

Logical          Vlan          TAG     MAC         STP         Logical           Tagging 
interface        members               limit       state       interface flags  
ge-5/0/22.0                            294912                                     tagged     
                 cloud-instances2-b-codfw 2105 294912 Forwarding                  tagged

cloudvirt2003-dev

Logical          Vlan          TAG     MAC         STP         Logical           Tagging 
interface        members               limit       state       interface flags  
ge-8/0/6.0                             294912                                     tagged     
                 cloud-instances2-b-codfw 2105 294912 Forwarding                  tagged
Papaul updated the task description. (Show Details)Wed, Feb 6, 10:47 PM
Papaul reassigned this task from Papaul to aborrero.Wed, Feb 6, 10:49 PM

@aborrero @Andrew all yours . Let me know if you have any questions.

Change 488896 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] cloudcontrol2001-dev: fix PTR record

https://gerrit.wikimedia.org/r/488896

Change 488896 merged by Arturo Borrero Gonzalez:
[operations/dns@master] cloudcontrol2001-dev: fix PTR record

https://gerrit.wikimedia.org/r/488896

Change 488897 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudcontrol2001-dev: spare system for now

https://gerrit.wikimedia.org/r/488897

Change 488897 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudcontrol2001-dev: spare system for now

https://gerrit.wikimedia.org/r/488897

Change 488899 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvirt200X-dev: add roles in codfw1dev

https://gerrit.wikimedia.org/r/488899

Change 488899 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvirt200X-dev: add roles in codfw1dev

https://gerrit.wikimedia.org/r/488899

Change 488902 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: cloudvirt200X-dev: add hosts overrides

https://gerrit.wikimedia.org/r/488902

Change 488902 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: cloudvirt200X-dev: add hosts overrides

https://gerrit.wikimedia.org/r/488902

Change 488905 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: cloudvirt200X-dev: fix wrong hiera keys names

https://gerrit.wikimedia.org/r/488905

Change 488905 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: cloudvirt200X-dev: fix wrong hiera keys names

https://gerrit.wikimedia.org/r/488905

@Andrew there is no raid controller on the new servers. They all have 2x200GB SSD's

ok -- let's just use partman/raid1.cfg then, for now at least. Thanks!

I will be switching cloudvirts to partman/raid1-lvm.cfg. We need LVM for nova at least in cloudvirt servers. I will left cloudcontrol2001-dev as is.

Error: /Stage[main]/Profile::Openstack::Base::Nova::Compute::Service/Mount[/var/lib/nova/instances]: Could not evaluate: Execution of '/bin/mount /var/lib/nova/instances' returned 32: mount: special device /dev/mapper/tank-data does not exist

Change 488914 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvirt200[123]-dev: use partman/raid1-lvm-xfs-nova.cfg

https://gerrit.wikimedia.org/r/488914

@Andrew there is no raid controller on the new servers. They all have 2x200GB SSD's

ok -- let's just use partman/raid1.cfg then, for now at least. Thanks!

I will be switching cloudvirts to partman/raid1-lvm.cfg. We need LVM for nova at least in cloudvirt servers. I will left cloudcontrol2001-dev as is.

Actually partman/raid1-lvm-xfs-nova.cfg I think is suitable. Is in use by other virt server.

Change 488914 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvirt200[123]-dev: use partman/raid1-lvm-xfs-nova.cfg

https://gerrit.wikimedia.org/r/488914

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

['cloudvirt2001-dev.codfw.wmnet', 'cloudvirt2002-dev.codfw.wmnet', 'cloudvirt2003-dev.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201902071202_aborrero_1966.log.

Mentioned in SAL (#wikimedia-operations) [2019-02-07T12:03:16Z] <arturo> T214448 reimaging again cloudvirt200[1-3]-dev.codfw.wmnet

Change 488918 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: cloudvirt200[1-3]-dev: fix extra LVM volume name

https://gerrit.wikimedia.org/r/488918

Change 488918 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: cloudvirt200[1-3]-dev: fix extra LVM volume name

https://gerrit.wikimedia.org/r/488918

I'm seeing this in cloudvirt2003-dev:

[   13.270987] kvm: disabled by bios
[   13.729525] kvm: disabled by bios

Change 488926 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: cloudvirt200[1-3]-dev: fix again instance_dev hiera key

https://gerrit.wikimedia.org/r/488926

Change 488926 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: cloudvirt200[1-3]-dev: fix again instance_dev hiera key

https://gerrit.wikimedia.org/r/488926

Script wmf-auto-reimage was launched by aborrero on cumin2001.codfw.wmnet for hosts:

['cloudvirt2003-dev.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201902071344_aborrero_25895.log.

Completed auto-reimage of hosts:

['cloudvirt2003-dev.codfw.wmnet']

and were ALL successful.