eqiad: (3) nodes for Druid / analytics
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Ottomata
	Mar 3 2016, 11:04 PM

Description

We're pretty flexible with this request. We need 3 nodes in eqiad, RAM (48G+) and CPU (12core+) heavy, ~4 1 T SSDs.

This will replace the old OOW analytics1015, analytics1017, and analytics1021.

This should come out of Analytics hardware budget for this FY.

Related Objects
Search...

Status	Assigned	Task
Duplicate	• mobrovac	T125345 Many error 500 from pageviews API "Error in Cassandra table storage backend"
Resolved	JAllemandou	T124314 Better response times on AQS (Pageview API mostly) {melc}
Resolved	elukey	T133785 rack/setup/deploy aqs100[456]
Declined	Ottomata	T124951 Hadoop Node expansion for end of FY
Duplicate	elukey	T132938 Provision new SSD-able machines on AQS
Resolved	RobH	T124947 eqiad: (3) AQS replacement nodes
		Unknown Object (Task)
Resolved	Ottomata	T134275 rack/setup/deploy 3 eqiad druid nodes
Resolved	RobH	T128807 eqiad: (3) nodes for Druid / analytics
		Unknown Object (Task)
		Unknown Object (Task)
		Unknown Object (Task)

Event Timeline

Ottomata created this task.Mar 3 2016, 11:04 PM

Restricted Application added a project: SRE. · View Herald TranscriptMar 3 2016, 11:04 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Ottomata updated the task description. (Show Details)Mar 3 2016, 11:05 PM

The hardware currently in analytics1015, analytics1017, and analytics1021:

Dell PowerEdge R720xd
RT 3646
Dual CPU Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz / 6 cores
- Please note hyperthreading is NOT enabled on these. Is this intentional?
48GB RAM

The new quotes will not be in the large 12 disk bay LFF 2U systems, but instead move down into a 1U system. (The 1U systems can still hold 8-10 sff disks, depending on the chassis.)

New quote requests will include the follwing:

CPU option of dual 6 core, possibly also in 10 core or more?
We've standardized our memory purchases over time to 32/64/128, so this will scale up to 64GB.
4 * 1.2TB Intel S2610 SSD
- sw raid, no hw raid controller
1GBE NIC

Performance notes:

The current ratio is 4GB per CPU core.
- 6 core per cpu * 2 cpu = 12 cores
We now order the Intel S3610 SSDs. These come in 800GB/1.2TB/1.6TB. I'll request this in 1.2TB.

Questions for analytics/@Ottomata:

Will hyperthreading be enabled on the new machines? (We do by default.) If we turn it on, we need to account for that in memory purchase and order twice the memory to keep the ratio of 4GB RAM per CPU core.
Do you want identical dual CPU 6 core, or is there any benefit to scaling up the number of cores and available memory?
If we can fit more cores per system, is there any benefit to lowering this cluster size from 3 to 2?

@ottomatta: Please provide feedback on the above questions and assign back to me.

RobH renamed this task from 3 new nodes for Druid to eqiad: (3) nodes for Druid / analytics.Mar 3 2016, 11:20 PM

RobH triaged this task as Medium priority.

RobH moved this task from Backlog to In Discussion / Review on the hardware-requests board.

If we can fit more cores per system, is there any benefit to lowering this cluster size from 3 to 2?

Will let @Milimetric chime in on this one.

Hyperthreading: yes.

More cores and RAM for these in general is good. They will be mostly in memory processors and query analyzers. These are also a distributed cluster, so I think 3 is better than 2 for failure mode purposes, but @Milimetric can probably fill more details here.

I say this order is a 'replacement' because we have to get rid of the old Dells because they are OOW. We were planning to just go ahead and use them for starting out with Druid. But since they are currently unused and we have been told that we should replace them with new hardware, we decided to order new nodes with specs that were more ideal for Druid (SSDs with less space, etc.).

I think 3 servers would be better than 2. We want these machines to have as much memory as possible, and 3 servers gives us the option to grow in memory if this cluster becomes as useful as we think it will be. Also, the way Druid splits up its services makes more sense with 3 machines, well, even more than that if possible. But I think 3 is a minimum starting size for the kind of work we want to do with it. (As in, Druid will run on 1 machine but we have too many plans for it to set it up that way).

If we can fit more cores per system, is there any benefit to lowering this cluster size from 3 to 2?

Let's stick with 3.

Do you want identical dual CPU 6 core, or is there any benefit to scaling up the number of cores and available memory?

Sure, more cores and memory here will always be helpful!

Will hyperthreading be enabled on the new machines? (We do by default.) If we turn it on, we need to account for that in memory purchase and order twice the memory to keep the ratio of 4GB RAM per CPU core.

Hyperthreading yes.

Per IRC convo: Dual 8 core + 64G of RAM sounds just right :)

Ok, IRC update after chatting with @Ottomata

Since this is a new service, the ideal core to memory ratio is largely unknown. At this point, its estimated that a dual cpu system between 12-16 cores and 64GB of memory is likely acceptable.

The proposed spare pool order for codfw happens to match this on T128910. I've added this request to that task.

RobH added a subtask: Unknown Object (Task).Mar 14 2016, 9:00 PM

RobH moved this task from In Discussion / Review to Pending Approval on the hardware-requests board.Mar 15 2016, 10:33 PM

Bump.

The systems that can be used for this were ordered today on T130738.

I'm now assigning this task for approval
@mark: Please review the above request. Please attach relevant approvals for allocation, or add questions/comments for followup, and assign back to me. This request was noted on the spare pool order on T128910.

Thanks!

Bump

Approved.

@RobH, SSDs too ja?

Negative, these systems don't have SSDs. There were spare pool systems ordered with SATA.

So we can order new systems with SSDs, or swap the in warranty sata disks out for SSDs.

RobH mentioned this in Unknown Object (Task).Mar 31 2016, 4:01 PM

RobH added a subtask: Unknown Object (Task).

RobH moved this task from Pending Approval to Allocation/Ordering/Implementation on the hardware-requests board.Apr 5 2016, 10:12 PM

RobH mentioned this in Unknown Object (Task).Apr 7 2016, 9:54 PM

RobH edited subtasks, added: Unknown Object (Task); removed: Unknown Object (Task), Unknown Object (Task).Apr 7 2016, 9:59 PM

RobH added subtasks: Unknown Object (Task), Unknown Object (Task).

RobH closed subtask Unknown Object (Task) as Resolved.

RobH mentioned this in Unknown Object (Task).Apr 7 2016, 10:01 PM

RobH closed subtask Unknown Object (Task) as Resolved.

RobH removed subtasks: Unknown Object (Task), Unknown Object (Task).

This hardware request has been granted with the purchase of new druid nodes via procurement task T132068.

@JAllemandou @Milimetric, can you comment on desired partition layout for this?

I'm going to guess a small / partition, and then the rest of space (lvm?) on RAID 10 across all 4 disks.

Perhaps to make this easy, we should just ask for a small 30G / raid 10 array across all disks, and leave the rest up to us later.

@Ottomata: I agree on having a 30G / across disks, and my guess is that having one lvm on RAID10 for the rest would be fine (but I'm no Druid expert).

RobH added a parent task: T134275: rack/setup/deploy 3 eqiad druid nodes.May 3 2016, 4:03 PM

I don't remember any specific advice from the Imply folks on this, so sure, 30G for / and the rest on RAID 10 sounds good to me. We'll probably archive old segments on HDFS anyway, so we don't need to stress too much about reliability.

Mark.Otaris unsubscribed.May 15 2016, 8:24 PM

RobH closed subtask Unknown Object (Task) as Resolved.Oct 12 2016, 5:48 PM

eqiad: (3) nodes for Druid / analyticsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

eqiad: (3) nodes for Druid / analytics
Closed, ResolvedPublic
Actions

Related Objects
Search...