Page MenuHomePhabricator

Allocate hardware for salt master in eqiad
Closed, ResolvedPublic

Description

We need a box comparable to palladium (cpu, memory) as a dedicated salt master, again in eqiad. What's available? Do we have any spares?

Event Timeline

ArielGlenn raised the priority of this task from to Needs Triage.
ArielGlenn updated the task description. (Show Details)
ArielGlenn added a project: hardware-requests.
ArielGlenn added a subscriber: ArielGlenn.
Restricted Application added a project: acl*sre-team. · View Herald TranscriptOct 12 2015, 10:28 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ArielGlenn set Security to None.
ArielGlenn added a subscriber: RobH.
Restricted Application added a subscriber: Matanya. · View Herald TranscriptOct 13 2015, 2:29 PM
RobH added a comment.EditedOct 13 2015, 2:48 PM

We may want to allocate:

Dell PowerEdge R420, Dual Intel Xeon E5-2440, 64GB Memory, Dual 300GB SSD, H310 Mini Raid Card

wmf3542

Since the H310 is useless for high disk i/o operations, and salt doesn't care about disk io. The high memory and dual cpu would likely be a good fit. (it is the same CPU as palladium.)

RobH triaged this task as Medium priority.Oct 13 2015, 2:51 PM
RobH moved this task from Backlog to Pending Approval on the hardware-requests board.

Any movement on this front? Is that spare still around?

Let's go with Rob's suggestion

RobH claimed this task.Oct 26 2015, 5:12 PM
RobH raised the priority of this task from Medium to High.
RobH added a subscriber: Cmjohnson.
RobH added a comment.Oct 26 2015, 5:14 PM

I'll handle getting this spun up, and any potential onsite tasks. (since it responds to mgmt ssh, there likely won't be any other than the labeling task)

I'll get this named and setup for use later today.

RobH closed this task as Resolved.Oct 26 2015, 7:20 PM

WMF3542 is allocated as hostname lawrencium for this use. T116645 is for installation, resolving hardware-requests.

RobH reopened this task as Open.Oct 26 2015, 8:06 PM
RobH added a subscriber: mark.

So WMF3542 has an H310 controller, which Jessie doesn't detect.

Since we don't like using these controllers, I can either replace it with a 710 (overkill), or allocate a different machine.

Unfortunately, there isn't another 64GB+ memory machine, other than the overly storage based: Dell PowerEdge R420, dual Intel Xeon E5-2450 v2 2.50GHz, 64GB Memory, (4) 3TB Disks

We have only two of these left. I think this is a good fit, except for the 4 3TB disks. I'll need to chat with @mark to ensure this allocation of the system is ok. (Or if we rather install more memory into another machine.)

Alternatively, we could allocate promethium, Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32GB Memory with dual 500GB. 32GB isn't nearly as much as palladium, so it may need to be raised for salt use.

@ArielGlenn: Is 64+GB a hard requirement? The original request is similar to palladium, but palladium has WAY more memory than any other system I have spare. If we need to upgrade the memory from 64, we may as well use promethium and upgrade from 32. (Saving the newer spare misc systems with 4*3TB disks.)

RobH added a comment.Oct 26 2015, 8:14 PM

Chatted with Ariel in IRC.

Going to go with one of the:

Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32GB Memory, Dual 300GB SSD, Dual 500GB Nearline SAS

promethium
neodymium (ssds not confirmed present)

If neodymium has no ssds, I'll go with it.

RobH added a comment.Oct 27 2015, 12:50 AM

I haven't gotten to this in time today. I may get some of it setup in advance of tomorrow, but likely I'll simply be picking this back up in the AM for completion.

RobH added a comment.Oct 27 2015, 4:37 PM

Ok so Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32GB Memory, Dual 300GB SSD, Dual 500GB Nearline SAS also include H310.

Since we'll have to swap a machine out to one of the (10) H710 replacement controllers (we have spare for this reason), lets go back to the original allocation.

wmf3542 - lawrencium - and will have the name label and H310 controller changed. I'll create the onsite tasks.

mark added a comment.Oct 27 2015, 5:35 PM

Er, approvals? :)

Also, can't this be a VM in ganeti?

No, this should not be a VM. It should be a dedicated server.

RobH closed this task as Resolved.Oct 27 2015, 5:58 PM
RobH reopened this task as Open.Oct 27 2015, 6:38 PM

It turns out this has far too much memory (it was noted in spares as 32, but instead as 96, likely from old decom hardware memory upgrades.)

So, I'm reclaiming this and allocating another system instead.

mark added a comment.Oct 27 2015, 6:47 PM

No, this should not be a VM. It should be a dedicated server.

Could you elaborate?

RobH added a comment.Oct 27 2015, 9:14 PM

My bad on skipping @mark for approvals, I was under the impression that this was roadmapped, expected, and discussed during our recent ops meeting.

So, pending his sign off, my next suggestion would be to simply re-use an old cp system, WMF3096.

@ArielGlenn: Please elaborate for @mark as to why this cannot exist in a ganeti VM?

There may be possible concern that using a salt-master to send commands to hosts could include wanting to upgrade ganeti cluster hosts, which then wouldn't be easily done on the one ganeti host housing the salt-master. (Not sure this is a broad enough use case to warrant bare metal.)

I don't want the salt master to be unavailable if we have problems with ganeti; I also want to guarantee access to a minimum amount of cpu resources, which does not happen if the master is on a vm on shared hardware. We already have load issues that are hard to track, I'd like to not complicate the issue further.

Note that the end goal will be to have redundant masters and syndics in the two large data centers, following the setup that LinkedIn uses. IIRC they have their syndics on the same host as the masters, in a multi-master and multi-syndic setup.

RobH reassigned this task from RobH to mark.Oct 29 2015, 4:36 PM

With @ArielGlenn's update, I've reassigned this back to @mark for his review.

mark added a comment.Nov 9 2015, 2:20 PM

Chatted with Ariel in IRC.
Going to go with one of the:
Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32GB Memory, Dual 300GB SSD, Dual 500GB Nearline SAS

promethium
neodymium (ssds not confirmed present)

If neodymium has no ssds, I'll go with it.

Approved.

RobH closed this task as Resolved.Nov 9 2015, 9:51 PM

Well, neodymium has SSDs, but the OS will be placed on the larger SAS disks. All of the possible allocations are slightly out of spec, this one being we can simply remove the unused SSDs for use in another system.

(If this is wrong, we can swap it about and reinstall).

Installation and deployment will be tracked via T118210.