We need a box comparable to palladium (cpu, memory) as a dedicated salt master, again in eqiad. What's available? Do we have any spares?
|Resolved||ArielGlenn||T115292 take steps outlined at techops offiste to (try to) address salt reliability|
|Declined||ArielGlenn||T102039 Disabling agent forwarding breaks dsh based restarts for Parsoid (required for deployments)|
|Resolved||ArielGlenn||T115287 Move salt master to separate host from puppet master|
|Declined||RobH||T116645 install/setup/deploy lawrencium as eqiad salt-master|
|Resolved||ArielGlenn||T118210 install/setup/deploy neodymium as salt-master in eqiad|
|Resolved||RobH||T115288 Allocate hardware for salt master in eqiad|
|Resolved||Cmjohnson||T116776 label server lawrencium / wmf3542 & swap H310 for H710 controller|
We may want to allocate:
Dell PowerEdge R420, Dual Intel Xeon E5-2440, 64GB Memory, Dual 300GB SSD, H310 Mini Raid Card
Since the H310 is useless for high disk i/o operations, and salt doesn't care about disk io. The high memory and dual cpu would likely be a good fit. (it is the same CPU as palladium.)
I'll handle getting this spun up, and any potential onsite tasks. (since it responds to mgmt ssh, there likely won't be any other than the labeling task)
I'll get this named and setup for use later today.
So WMF3542 has an H310 controller, which Jessie doesn't detect.
Since we don't like using these controllers, I can either replace it with a 710 (overkill), or allocate a different machine.
Unfortunately, there isn't another 64GB+ memory machine, other than the overly storage based: Dell PowerEdge R420, dual Intel Xeon E5-2450 v2 2.50GHz, 64GB Memory, (4) 3TB Disks
We have only two of these left. I think this is a good fit, except for the 4 3TB disks. I'll need to chat with @mark to ensure this allocation of the system is ok. (Or if we rather install more memory into another machine.)
Alternatively, we could allocate promethium, Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32GB Memory with dual 500GB. 32GB isn't nearly as much as palladium, so it may need to be raised for salt use.
@ArielGlenn: Is 64+GB a hard requirement? The original request is similar to palladium, but palladium has WAY more memory than any other system I have spare. If we need to upgrade the memory from 64, we may as well use promethium and upgrade from 32. (Saving the newer spare misc systems with 4*3TB disks.)
Ok so Dell PowerEdge R420, Dual Intel Xeon E5-2440, 32GB Memory, Dual 300GB SSD, Dual 500GB Nearline SAS also include H310.
Since we'll have to swap a machine out to one of the (10) H710 replacement controllers (we have spare for this reason), lets go back to the original allocation.
wmf3542 - lawrencium - and will have the name label and H310 controller changed. I'll create the onsite tasks.
It turns out this has far too much memory (it was noted in spares as 32, but instead as 96, likely from old decom hardware memory upgrades.)
So, I'm reclaiming this and allocating another system instead.
My bad on skipping @mark for approvals, I was under the impression that this was roadmapped, expected, and discussed during our recent ops meeting.
So, pending his sign off, my next suggestion would be to simply re-use an old cp system, WMF3096.
There may be possible concern that using a salt-master to send commands to hosts could include wanting to upgrade ganeti cluster hosts, which then wouldn't be easily done on the one ganeti host housing the salt-master. (Not sure this is a broad enough use case to warrant bare metal.)
I don't want the salt master to be unavailable if we have problems with ganeti; I also want to guarantee access to a minimum amount of cpu resources, which does not happen if the master is on a vm on shared hardware. We already have load issues that are hard to track, I'd like to not complicate the issue further.
Note that the end goal will be to have redundant masters and syndics in the two large data centers, following the setup that LinkedIn uses. IIRC they have their syndics on the same host as the masters, in a multi-master and multi-syndic setup.
Well, neodymium has SSDs, but the OS will be placed on the larger SAS disks. All of the possible allocations are slightly out of spec, this one being we can simply remove the unused SSDs for use in another system.
(If this is wrong, we can swap it about and reinstall).
Installation and deployment will be tracked via T118210.