Basically we should get ~24ish nodes like the nice ones in eqiad. The cluster will be smaller but we can't afford 31 of the nice nodes and its not a great choice anyway because then the secondary dc's cluster would be way overpowered. Start with what we budgeted and go from there. If you get a choice always put extra cash into nicer (but not larger!) SSDs. And RAM. Those are life. LIFE!
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Deskana | T105703 Set up a CirrusSearch cluster in codfw (Dallas, Texas) | |||
Resolved | Papaul | T111080 rack & initial setup of elastic2001-2024 | |||
Resolved | RobH | T105707 Request Elasticsearch hardware for secondary CirrusSearch in codfw | |||
Resolved | RobH | T97049 CODFW Search Servers |
Event Timeline
Why 24 boxes here and not 23 or $number? Is there load analysis I can use to justify? I don't understand well enough to make the request.
I believe it was semi arbitrary and based on budget. We have 15 nice
machines in the eqiad cluster and 16 good machines and I guestimated that
half again as many would be enough but never ran any hard numbers. 24 may
be too many but it doesn't feel like too too many. If it is too few we can
lower the rescore window for phrases on queries sent to that cluster to
lower the load.
A more conservative approach would be to buy 16 and setup the cluster and
run load tests against it. If we feel we need more machines we can buy them
then.
another consideration is disk utilization, we're roughly at 50% in eqiad ATM (each machine has raid1 2x500GB SSD raid0 2x300GB SSD) and it seems relatively stable over the last 9 months. assuming disk used stays the same 24 machines seem a good initial number, assuming the total disk used stays the same they'll be at ~75% disk space utilization
(procurement is tracked in RT #8524)
@RobH we should refresh the quote we for had elasticsearch hw in codfw in RT #8524, and quote larger (intel, supported by vendor) SSD for comparison (I think S3500 do 800G?)
I've requested updated quotes on https://rt.wikimedia.org/Ticket/Display.html?id=8524 and will follow up on them once they come back from Dell.
Since the Dell quote will involve a generation upgrade (so new mainboard and the like), there doesn't seem to be any reason not to get an HP quote for these as well. Once I have an updated Dell quote back for a baseline, I'll request the HP quote.
to clarify, I think it makes sense to quote 800G SSD and also 300G SSD for price comparison
It's my understanding that there is no specific action required for this ticket from the Discovery Department, so I am removing the Discovery-Search (Current work) from this task.
Noting here that we are aware that there is some work required for us to do once the servers are ready, though. :-)
Update:
The quotes for this have been reviewed by myself, @chasemp, & @fgiunchedi and are in final management review/approval.