Page MenuHomePhabricator

Allocate temporary Elasticsearch nodes from spares pool for Logstash
Closed, DeclinedPublic

Description

Please re-image the following nodes, to be used by Logstash, hostnames should be logstash1004-006.

Dell PowerEdge R420, dual Intel Xeon E5-2450 v2 2.50GHz, 64GB Memory, (4) 3TB Disks
wmf4544
wmf4543
wmf4541

This request is approved by Mark.

The plan is to keep Logstash and Kibana on logstash1001-003, migrating Elasticsearch to these new nodes. This is proposed as a temporary solution until hardware is purchased specifically for Logstash (+ Elasticsearch) per T84958, because the current config is experiencing OOM events daily.

Event Timeline

Gage assigned this task to RobH.
Gage raised the priority of this task from to High.
Gage updated the task description. (Show Details)
Gage added a project: acl*sre-team.
Gage added subscribers: Gage, bd808.

So we need to do this AND buy new hardware for it and move things around? Any way we can avoid that?

In T87460#995506, @mark wrote:

So we need to do this AND buy new hardware for it and move things around? Any way we can avoid that?

If this hardware is within warranty that makes Ops happy I think it would be all we need for the near/mid term. If these spares are spare because they are soon to be decommed entirely then we are probably better off just buy 3 new boxes and doing the move of Elasticsearch once.

If this hardware is within warranty that makes Ops happy I think it would be all we need for the near/mid term.

The growth path when we outgrow the 3 new boxes would be to add additional boxes to the cluster rather than replacing entirely (horizontal scaling).

I think the IRC and IRL discussions on this last week came down with @mark being more in favor of putting though a procurement ticket for new hardware rather than monkey patching things by stealing more boxes from the spare pool or scrounging RAM to stuff in the existing boxes.

I don't have access to anything that will tell me definitively what the hardware specs are for the current generation of production Elasticsearch boxes, but @Manybubbles and @chad seemed to think that having the Logstash Elasticsearch boxes match them would be a generally good thing. One change we will want to make for Elasticsearch is to use spinning disk rather than SSD. IOPS are important for any Elasticsearch cluster, but with data retention needs >1TB per node SSD costs are likely to be ridiculous for logging. My naive guess is that the disks we added to logstash100[123] (5.5TB usable) will be sufficient for the near future. Ideally in a new build that space would be spread over as many spindles/controllers as possible to add IOPS to the system. Data redundancy via RAID mirroring on each host is unnecessary as the data will be stored redundantly across the Elasticsearch cluster.

Alright. @RobH: can you look at what it would take to procure 3 additional nodes, similar to the recent ElasticSearch orders, but with hard drives instead?

Rejecting the temp allocation ticket, in favor of new task to procure three new hosts (T89402)