Page MenuHomePhabricator

Labs test cluster in codfw
Closed, ResolvedPublic

Description

It would be very nice to have a labs test cluster for ops to poke at, especially during ironic setup.

Needed resources:

  • - labtestcontrol2001 (keystone, glance, some nova services, labs puppetmaster)
  • - labtestnet2001 (nova-network and/or neutron)
  • - labtestneutron2001 (future neutron service host)
  • - labtestservices2001 (designate, pdns, future ldap)
  • - labtestweb2001 (horizon)
  • - labtestvirt2001 (nova compute) No need for an expensive virt box here, could be anything
  • - labtestmetal2001 -- a host designated for ironic (bare-metal) allocation
  • - corresponding db services for all of the above.
  • - a private IP range for instances
  • - a public IP range for hosts
  • - a (small!) public IP range for instances. Could be as few as three or four randomly scattered IPs if necessary.

That is seven servers. I'd like a 32Gb server for labtestvirt2001; the other servers can be essentially any hardware as I don't expect them to support much load.

Event Timeline

Andrew claimed this task.
Andrew raised the priority of this task from to Needs Triage.
Andrew updated the task description. (Show Details)
Andrew added projects: Cloud-Services, Cloud-VPS.
Andrew subscribed.
Andrew set Security to None.
Andrew added subscribers: chasemp, coren, mark.

I'm sure that we can use a 1g system for labnet2001 if that's otherwise a blocker.

https://wikitech.wikimedia.org/wiki/Server_Spares

Rob suggests that we use Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32 GB Memory, (2) 500GB Disks which would suit me just fine, and there are enough of them to go around. But let's just use an 8g system for the bare-metal test, since who cares?

Mark, if you can tentatively approve of this use for the spare Dallas servers, I'll set up more specific subtasks for each box.

If we have hardware that is out of warranty and (therefore) won't be used for new production stuff, then it could be considered "free" for this (non-production) use case. Only in that case it doesn't really impact our budget.

https://rt.wikimedia.org/Ticket/Display.html?id=9677 is the rt ticket to track the quoting of a 1u misc system for pricing considerations on this request.

https://rt.wikimedia.org/Ticket/Display.html?id=9677 now has updated pricing info for a new single cpu misc system.

DO NOT PUT PRICING IN THIS TASK.

chasemp triaged this task as Medium priority.Oct 8 2015, 6:28 PM

We should be able to use out-of-warranty 8g systems for:

  • labservices2001 (designate, pdns, future ldap)
  • labmetal2001 -- a host designated for ironic (bare-metal) allocation
  • labnet2001 (nova-network and/or neutron) or - labneutron2001 (future neutron service host)

A bigger box (e.g. 32g) would be nice for the virt node.

I'd call labnet2001 "labnet2001"; it's the network control node independently of the technology (imo)

We have two 8g servers out of warranty, and two 16g servers out of warranty. So if we don't mind gobbling up all of those we can phase one of this whole thing with only one 'non-free' 32g box.

Then, at some point we'll need another (trivial, potentially tiny) compute node to support ironic.

I just want to ensure we have a summary so far, since the task description has since shifted to be slightly innaccurate:

  • This is a planned ongoing development cluster; for development of labs, not for general labs use.
  • There is no timeline for the return of these systems.
  • The 3 8GB Dell PowerEdge R410, Dual Intel Xeon X5650 (2.66 GHz), 8GB Memory, (2) 500GB Disks (ssl2002-2004) will likely be allocated for this task. (Pending the review of pricing on the first item mentioned in this summary.)
    • These systems will serve as:
      • labservices2001 (designate, pdns, future ldap)
      • labmetal2001 -- a host designated for ironic (bare-metal) allocation
      • labnet2001 (nova-network and/or neutron) or - labneutron2001 (future neutron service host)
  • A system is required for a virtual node system. @Andrew suggests a 32GB box would be preferrable, but we don't have any that are not under warranty.
  • Chatting in IRC with @Andrew, we'll need the 3 * 8GB systems (3) and an additional 4 more (as he needs 7 systems.) One of these would be the above mentioned virtual node system.
  • Please note the initial request included a 10Gb NIC, but none of the above have that.

As with all allocations, we'll need to ensure that these are approved by @mark.

@Andrew & @coren: Please review the above summary and let me know if anything is missing or incorrect. Thanks!

If the above is all correct, next steps is @Andrew (and myself) syncing up with @mark for his approval/corrections on the above allocations & pricing review.

Let's skip the Horizon box for now -- we can consolidate horizon services on the controller node or buy one later on.

As discussed at off-site: Mark approves of this, thinks that reusing the 3 off-warranty boxes and the 2 almost-expired boxes won't count against the labs budget.

This will need some misc database support as well -- adding Jaime to the ticket. We could potentially have the dbs all run on the controller node but I'd prefer they be on a separate db host in order to resemble eqiad.

Have you coordinated with @Papaul about the currently unknown-function Cisco servers that used to be Labs and that currently live in Dallas?

Ok, allocations for this:

  • WMF3763 (previously named ssl2002) - 8GB System
  • WMF3810 (previously named ssl2003) - 8GB System
  • WMF5835 (previously named capella) - 16GB System - This is one of the newer of the 16GB systems
  • WMF5834 (previously named haedus) - 16GB System - This is one of the newer of the 16GB systems
  • WMF3644 (previously named zhen) - 16GB
  • wmf5850 (32GB system) - 32GB system that is under warranty. This will need two network connections wired.

I had to pull one of the 8GB from the offering (ssl2004) since it didnt exist (someone has used it elsewhere, or it was a mistake on spares, but I'm inclined to think it was used by someone else in ops.)

Please cable up two nics for:

wmf5850 (labtestvirt2001)
WMF5835 (labtestnet2001)

The second nic will be on the instance vlan.

The others servers can just have one nic cabled for now... I'm still not sure about neutron.

@Andrew: Chatting with @mark @ lunch, we need to clear the allocation of the 32GB misc machine with him. (It was a passing comment, so we'll need his approval on this task.)

Naming!

wmf5850 (32G): labtestvirt2001
WMF5835 (16G): labtestnet2001
WMF3763 (8GB): labtestneutron2001
WMF3810 (8GB): labtestmetal2001
WMF5834 (16GB): labtestcontrol2001
WMF3644 (16GB): labtestservices2001

So we're down to just one system (32 GB) in warranty now?

This is Approved.

Note I cannot create the onsite allocation of these sysetms until the network is setup for labs in codfw; due to the fact I'm not sure what the row/rack requirements may be for the labs subnets.

RobH claimed this task.

Ok, so everything in the blocking network tasks states row B is labs in codfw as well (for now.)

So I'm resolving this as T117097 tracks deployment.