Request Elasticsearch hardware for secondary CirrusSearch in codfw
Closed, ResolvedPublic

Description

Basically we should get ~24ish nodes like the nice ones in eqiad. The cluster will be smaller but we can't afford 31 of the nice nodes and its not a great choice anyway because then the secondary dc's cluster would be way overpowered. Start with what we budgeted and go from there. If you get a choice always put extra cash into nicer (but not larger!) SSDs. And RAM. Those are life. LIFE!

Manybubbles updated the task description. (Show Details)
Manybubbles raised the priority of this task from to High.
Manybubbles assigned this task to EBernhardson.
Manybubbles set Security to None.

Why 24 boxes here and not 23 or $number? Is there load analysis I can use to justify? I don't understand well enough to make the request.

I believe it was semi arbitrary and based on budget. We have 15 nice
machines in the eqiad cluster and 16 good machines and I guestimated that
half again as many would be enough but never ran any hard numbers. 24 may
be too many but it doesn't feel like too too many. If it is too few we can
lower the rescore window for phrases on queries sent to that cluster to
lower the load.

A more conservative approach would be to buy 16 and setup the cluster and
run load tests against it. If we feel we need more machines we can buy them
then.

fgiunchedi added a subscriber: fgiunchedi.EditedJul 27 2015, 1:48 PM

another consideration is disk utilization, we're roughly at 50% in eqiad ATM (each machine has raid1 2x500GB SSD raid0 2x300GB SSD) and it seems relatively stable over the last 9 months. assuming disk used stays the same 24 machines seem a good initial number, assuming the total disk used stays the same they'll be at ~75% disk space utilization

(procurement is tracked in RT #8524)

fgiunchedi reassigned this task from EBernhardson to RobH.Jul 27 2015, 5:55 PM
fgiunchedi added subscribers: RobH, EBernhardson.

@RobH we should refresh the quote we for had elasticsearch hw in codfw in RT #8524, and quote larger (intel, supported by vendor) SSD for comparison (I think S3500 do 800G?)

RobH added a comment.Jul 27 2015, 5:57 PM

s3500 max out at 800 gb, larger than that moves up to the s3700 series

RobH added a comment.Jul 27 2015, 10:59 PM

I've requested updated quotes on https://rt.wikimedia.org/Ticket/Display.html?id=8524 and will follow up on them once they come back from Dell.

Since the Dell quote will involve a generation upgrade (so new mainboard and the like), there doesn't seem to be any reason not to get an HP quote for these as well. Once I have an updated Dell quote back for a baseline, I'll request the HP quote.

to clarify, I think it makes sense to quote 800G SSD and also 300G SSD for price comparison

It's my understanding that there is no specific action required for this ticket from the Discovery Department, so I am removing the Discovery-Search (Current work) from this task.

Noting here that we are aware that there is some work required for us to do once the servers are ready, though. :-)

Deskana moved this task from Needs triage to Search on the Discovery board.
Deskana moved this task from Search to Tracking on the Discovery board.
RobH added a comment.Aug 18 2015, 7:23 PM

Update:

The quotes for this have been reviewed by myself, @chasemp, & @fgiunchedi and are in final management review/approval.

RobH added a comment.Aug 26 2015, 11:53 PM

This order has been submitted, and I'm awaiting shipment updates from the vendor.

RobH lowered the priority of this task from High to Normal.Aug 26 2015, 11:53 PM

Noting here that we are aware that there is some work required for us to do once the servers are ready, though. :-)

This work is tracked in T109734.

RobH added a comment.Aug 31 2015, 5:30 PM

ETA is today.

chasemp closed this task as Resolved.Sep 15 2015, 10:08 PM

requested and answered see T111080