Page MenuHomePhabricator

eqiad: (3) nodes for Druid / analytics
Closed, ResolvedPublic

Description

We're pretty flexible with this request. We need 3 nodes in eqiad, RAM (48G+) and CPU (12core+) heavy, ~4 1 T SSDs.

This will replace the old OOW analytics1015, analytics1017, and analytics1021.

This should come out of Analytics hardware budget for this FY.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH subscribed.

The hardware currently in analytics1015, analytics1017, and analytics1021:

  • Dell PowerEdge R720xd
  • RT 3646
  • Dual CPU Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz / 6 cores
    • Please note hyperthreading is NOT enabled on these. Is this intentional?
  • 48GB RAM

The new quotes will not be in the large 12 disk bay LFF 2U systems, but instead move down into a 1U system. (The 1U systems can still hold 8-10 sff disks, depending on the chassis.)

New quote requests will include the follwing:

  • CPU option of dual 6 core, possibly also in 10 core or more?
  • We've standardized our memory purchases over time to 32/64/128, so this will scale up to 64GB.
  • 4 * 1.2TB Intel S2610 SSD
    • sw raid, no hw raid controller
  • 1GBE NIC

Performance notes:

  • The current ratio is 4GB per CPU core.
    • 6 core per cpu * 2 cpu = 12 cores
  • We now order the Intel S3610 SSDs. These come in 800GB/1.2TB/1.6TB. I'll request this in 1.2TB.

Questions for analytics/@Ottomata:

  • Will hyperthreading be enabled on the new machines? (We do by default.) If we turn it on, we need to account for that in memory purchase and order twice the memory to keep the ratio of 4GB RAM per CPU core.
  • Do you want identical dual CPU 6 core, or is there any benefit to scaling up the number of cores and available memory?
  • If we can fit more cores per system, is there any benefit to lowering this cluster size from 3 to 2?

@ottomatta: Please provide feedback on the above questions and assign back to me.

RobH renamed this task from 3 new nodes for Druid to eqiad: (3) nodes for Druid / analytics.Mar 3 2016, 11:20 PM
RobH triaged this task as Medium priority.
RobH moved this task from Backlog to In Discussion / Review on the hardware-requests board.

If we can fit more cores per system, is there any benefit to lowering this cluster size from 3 to 2?

Will let @Milimetric chime in on this one.

More cores and RAM for these in general is good. They will be mostly in memory processors and query analyzers. These are also a distributed cluster, so I think 3 is better than 2 for failure mode purposes, but @Milimetric can probably fill more details here.

I say this order is a 'replacement' because we have to get rid of the old Dells because they are OOW. We were planning to just go ahead and use them for starting out with Druid. But since they are currently unused and we have been told that we should replace them with new hardware, we decided to order new nodes with specs that were more ideal for Druid (SSDs with less space, etc.).

I think 3 servers would be better than 2. We want these machines to have as much memory as possible, and 3 servers gives us the option to grow in memory if this cluster becomes as useful as we think it will be. Also, the way Druid splits up its services makes more sense with 3 machines, well, even more than that if possible. But I think 3 is a minimum starting size for the kind of work we want to do with it. (As in, Druid will run on 1 machine but we have too many plans for it to set it up that way).

If we can fit more cores per system, is there any benefit to lowering this cluster size from 3 to 2?

Let's stick with 3.

Do you want identical dual CPU 6 core, or is there any benefit to scaling up the number of cores and available memory?

Sure, more cores and memory here will always be helpful!

Will hyperthreading be enabled on the new machines? (We do by default.) If we turn it on, we need to account for that in memory purchase and order twice the memory to keep the ratio of 4GB RAM per CPU core.

Hyperthreading yes.

Per IRC convo: Dual 8 core + 64G of RAM sounds just right :)

RobH mentioned this in Unknown Object (Task).Mar 14 2016, 7:10 PM

Ok, IRC update after chatting with @Ottomata

Since this is a new service, the ideal core to memory ratio is largely unknown. At this point, its estimated that a dual cpu system between 12-16 cores and 64GB of memory is likely acceptable.

The proposed spare pool order for codfw happens to match this on T128910. I've added this request to that task.

RobH added a subtask: Unknown Object (Task).Mar 14 2016, 9:00 PM
RobH reassigned this task from RobH to mark.EditedMar 23 2016, 6:02 PM
RobH added a subtask: Unknown Object (Task).
RobH removed a subtask: Unknown Object (Task).
RobH added subscribers: Mark.Otaris, mark.

The systems that can be used for this were ordered today on T130738.

I'm now assigning this task for approval
@mark: Please review the above request. Please attach relevant approvals for allocation, or add questions/comments for followup, and assign back to me. This request was noted on the spare pool order on T128910.

Thanks!

Negative, these systems don't have SSDs. There were spare pool systems ordered with SATA.

So we can order new systems with SSDs, or swap the in warranty sata disks out for SSDs.

RobH mentioned this in Unknown Object (Task).Mar 31 2016, 4:01 PM
RobH added a subtask: Unknown Object (Task).
RobH mentioned this in Unknown Object (Task).Apr 7 2016, 9:54 PM
RobH edited subtasks, added: Unknown Object (Task); removed: Unknown Object (Task), Unknown Object (Task).Apr 7 2016, 9:59 PM
RobH added subtasks: Unknown Object (Task), Unknown Object (Task).
RobH closed subtask Unknown Object (Task) as Resolved.
RobH mentioned this in Unknown Object (Task).Apr 7 2016, 10:01 PM
RobH closed subtask Unknown Object (Task) as Resolved.
RobH removed subtasks: Unknown Object (Task), Unknown Object (Task).

This hardware request has been granted with the purchase of new druid nodes via procurement task T132068.

@JAllemandou @Milimetric, can you comment on desired partition layout for this?

I'm going to guess a small / partition, and then the rest of space (lvm?) on RAID 10 across all 4 disks.

Perhaps to make this easy, we should just ask for a small 30G / raid 10 array across all disks, and leave the rest up to us later.

@Ottomata: I agree on having a 30G / across disks, and my guess is that having one lvm on RAID10 for the rest would be fine (but I'm no Druid expert).

I don't remember any specific advice from the Imply folks on this, so sure, 30G for / and the rest on RAID 10 sounds good to me. We'll probably archive old segments on HDFS anyway, so we don't need to stress too much about reliability.

RobH closed subtask Unknown Object (Task) as Resolved.Oct 12 2016, 5:48 PM