Page MenuHomePhabricator

eqiad: (4) worker servers for kubernetes
Closed, ResolvedPublic

Description

Site/Location: EQIAD
Number of systems: 4
Service: kubernetes
Networking Requirements: internal IP
Processor Requirements: 16 cores total, no special requirements
Memory: 64 GB
Disks: 1 TB capacity as RAID1 (software RAID is ok)
NIC(s): 1gbe network card
Partitioning Scheme: n/a

We want to procure 4 servers that will be used as building blocks for our kubernetes cluster, as these will be the worker nodes.

Event Timeline

This was discussed during our operations meeting. This may be able to be fulfilled with spare units in codfw (and then we can just add more new servers to the spares pool at a more leisurely pace.)

I'll look into this today and update with details.

We happen to have the spares to do this. The request is for 4 servers, but the internal IP address requirement doesn't state if these are labs-support or labs-instances vlans?

That may dictate keeping them all in a single row (c) or spreading them out. I'm going to assume spreading them out is best, and pick the spare servers accordingly.

Asset tags: wmf4747, wmf4748, wmf4749, wmf4750
Dual Intel Xeon E5-2640 v3 2.6GHz/8 cores, 64GB memory, dual 1TB disks

Warranty expiry is 2019-03-24.

RobH mentioned this in Unknown Object (Task).Aug 2 2016, 5:10 PM
RobH created subtask Unknown Object (Task).
RobH renamed this task from Eqiad: procure 4 servers for kubernetes to eqiad: (4) spare pool servers for kubernetes.Aug 2 2016, 5:14 PM
RobH changed the task status from Open to Stalled.
RobH triaged this task as Medium priority.
RobH closed subtask Unknown Object (Task) as Declined.
RobH subscribed.

Escalating to @mark for his review/approval for allocation. The systems meet all requirements. Please note when these are allocated, we will only have 1 64GB spare pool system in eqiad, along with a number of 32GB spare systems.

Please note approval/questions and assign back to me, and I'll get them provisioned and setup!

RobH lowered the priority of this task from Medium to Low.Aug 8 2016, 11:03 PM

It was noted during today's operations meeting that these systems are part of what is needed for one of the team's stretch goals, but it is by far the least blocking item.

So while this is still pending approval, it seems its priority has likely dropped somewhat from normal to low.

mark raised the priority of this task from Low to Medium.Sep 7 2016, 4:53 PM

Now we're making this the main goal for next quarter, raising priority again.

However, TPMs (trusted platform modules) would be very useful for these. Do these spares support that? If not, we'd better lease some separately, instead of having to replenish the spare pool.

mark renamed this task from eqiad: (4) spare pool servers for kubernetes to eqiad: (4) worker servers for kubernetes.Sep 7 2016, 4:55 PM
RobH changed the task status from Stalled to Open.EditedSep 7 2016, 9:57 PM
RobH claimed this task.

The spare systems we have do NOT have TPM modules installed.

I've created a sub-task in the private S4 procurement space/project to request quotes for both individual TPM to add to existing spares, and new spares.

RobH mentioned this in Unknown Object (Task).Sep 7 2016, 10:36 PM
RobH created subtask Unknown Object (Task).

@RobH: I think it's a much better idea to lease these new systems (with TPMs) separately. Kubernetes nodes are an excellent candidate for leasing, unlike most one-off requests (misc systems). If we need to replenish the spare spool anyway shortly after this, they would have to be leased. Let's lease these instead.

Sounds good, I've already requested dell quotes on the linked sub-task. Once I have those back, I'll request similar quotes from HP.

So the quotes are now all in on the procurement sub task. However, they will not be ordered in time for the ops offsite next week, and we wanted some systems in place for that.

As such, @mark requested in our weekly ops meeting that I allocate temp hosts for this. I'm uncertain if we need all 4 of the temp hosts, or less is ok, so I'll just allocate all 4 temp hosts for now.

Systems won't have any kind of element/cluster name, as they are wholly for testing. I'll simply name them with their asset tags as hostnames. (This is unusual, but these are not going to stick around once the procurement sub-task is mgmt reviewed and they arrive on site.)

Once the temp systems are spun up, this task will be stalled at low priority (but remain assigned to me) until the new systems for use arrive and are allocated/deployed.

Systems for temp use are: wmf4747, wmf4748, wmf4749, wmf4750.

servers were purchased and allocated on T145026

RobH closed subtask Unknown Object (Task) as Resolved.Apr 11 2017, 8:43 PM