Page MenuHomePhabricator

codfw/eqiad: 2x systems for prometheus
Closed, ResolvedPublic

Description

Prometheus ATM is running off VMs, though taking significant resources on Ganeti, I'd like to get quotes for baremetal HW to host Prometheus. Such systems can be used to host multiple instances of Prometheus, though at the moment only ops prometheus would live there.

Sites: eqiad/codfw
Quantity: 2x systems per site
Specs: 32GB ram, at least 16 cores, 1GB NIC
Disk specs: we'll have to decide what makes the most sense between fast ssd or spinning disks, please quote for the following:

  • 2x 600GB SSDs per system
  • or 2x 3TB disks per system

finally, it might make sense to have both spinning disks and SSDs to use as cache (like in T88992) so the systems should be able to house 4x mixed ssd/spinning disks total, thanks!

Related Objects

StatusSubtypeAssignedTask
ResolvedRobH

Event Timeline

elukey triaged this task as Medium priority.Oct 19 2016, 11:10 AM

We have some spare systems in eqiad that may meet this (4*4tb 32GB systems), but none in codfw.

Task T145112 has info for a spare pool refresh in codfw.

RobH mentioned this in Unknown Object (Task).Oct 25 2016, 2:44 PM

Prometheus might be much better off with SSDs, in which case I assume we wouldn't be using these misc spares?

RobH changed the task status from Open to Stalled.Oct 27 2016, 5:06 PM

That is correct, the misc spares do not tend to have SSD(sff) but tend to be LFF SATA disks for high storage capacity.

The Dell systems cannot swap the hot swap disk sizes between lff and sff. HP can, so we could order HP systems and mix the disk size formats, but that seems like a bad idea for overall complexity.

I'd advise we fill the prometheus role with systems that use SFF disks. (These tend to be 1U systems that can house 8 SFF disks.)

So the previously linked in quote for spare pool systems won't work. I'll be creating a sub-task to get a new system specification quoted from both dell and hp.

RobH created subtask Unknown Object (Task).Oct 27 2016, 5:28 PM

Orders have been placed, sub-tasks follow implementation.

RobH closed subtask Unknown Object (Task) as Resolved.Aug 30 2017, 3:41 PM