Allocate temporary Elasticsearch nodes from spares pool for Logstash
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	• Gage
	Jan 23 2015, 9:30 PM

Description

Please re-image the following nodes, to be used by Logstash, hostnames should be logstash1004-006.

Dell PowerEdge R420, dual Intel Xeon E5-2450 v2 2.50GHz, 64GB Memory, (4) 3TB Disks
wmf4544
wmf4543
wmf4541

This request is approved by Mark.

The plan is to keep Logstash and Kibana on logstash1001-003, migrating Elasticsearch to these new nodes. This is proposed as a temporary solution until hardware is purchased specifically for Logstash (+ Elasticsearch) per T84958, because the current config is experiencing OOM events daily.

Related Objects
Search...

Status	Assigned	Task
Resolved	bd808	T69817 Monitor for anomalies/spikes in read failures of memcached
Resolved	bd808	T100735 Have Logstash report per-channel log message rate to Graphite
Resolved	bd808	T99735 Upgrade Logstash to 1.5.3
Resolved	bd808	T97545 reinstall logstash1001-1003
Resolved	Anomie	T1272 Deploy ApiFeatureUsage extension on WMF wikis
Resolved	bd808	T87521 Convert JobRunner.php to PSR-3 logging and add levels
Resolved	bd808	T88732 Decouple logging infrastructure failures from MediaWiki logging
Resolved	bd808	T96692 Rack and Setup (3) Logstash Servers
Resolved	RobH	T84958 eqiad: (3) servers for logstash service
Resolved	RobH	T89402 purchase 3 additional logstash nodes
Declined	RobH	T87460 Allocate temporary Elasticsearch nodes from spares pool for Logstash
Resolved	faidon	T97481 jessie installs fail - mirror issue due to jessie release?
Resolved	bd808	T97645 Elasticsearch not starting on Jessie hosts

Event Timeline

• Gage created this task.Jan 23 2015, 9:30 PM

• Gage assigned this task to RobH.

• Gage raised the priority of this task from to High.

• Gage updated the task description. (Show Details)

• Gage added a project: acl*sre-team.

• Gage added subscribers: • Gage, bd808.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 23 2015, 9:30 PM

RobH added a project: hardware-requests.Jan 27 2015, 12:05 AM

RobH set Security to None.

Reedy awarded a token.Jan 27 2015, 12:06 AM

yuvipanda awarded a token.Jan 27 2015, 12:06 AM

So we need to do this AND buy new hardware for it and move things around? Any way we can avoid that?

In T87460#995506, @mark wrote:

So we need to do this AND buy new hardware for it and move things around? Any way we can avoid that?

If this hardware is within warranty that makes Ops happy I think it would be all we need for the near/mid term. If these spares are spare because they are soon to be decommed entirely then we are probably better off just buy 3 new boxes and doing the move of Elasticsearch once.

In T87460#995507, @bd808 wrote:

If this hardware is within warranty that makes Ops happy I think it would be all we need for the near/mid term.

The growth path when we outgrow the 3 new boxes would be to add additional boxes to the cluster rather than replacing entirely (horizontal scaling).

bd808 added a project: Wikimedia-Logstash.Jan 27 2015, 12:13 AM

bd808 mentioned this in T87078: Upgrade RAM for logstash100[123] to 64G.

bd808 added a parent task: T84958: eqiad: (3) servers for logstash service.Jan 27 2015, 12:15 AM

I think the IRC and IRL discussions on this last week came down with @mark being more in favor of putting though a procurement ticket for new hardware rather than monkey patching things by stealing more boxes from the spare pool or scrounging RAM to stuff in the existing boxes.

I don't have access to anything that will tell me definitively what the hardware specs are for the current generation of production Elasticsearch boxes, but @Manybubbles and @chad seemed to think that having the Logstash Elasticsearch boxes match them would be a generally good thing. One change we will want to make for Elasticsearch is to use spinning disk rather than SSD. IOPS are important for any Elasticsearch cluster, but with data retention needs >1TB per node SSD costs are likely to be ridiculous for logging. My naive guess is that the disks we added to logstash100[123] (5.5TB usable) will be sufficient for the near future. Ideally in a new build that space would be spread over as many spindles/controllers as possible to add IOPS to the system. Data redundancy via RAID mirroring on each host is unnecessary as the data will be stored redundantly across the Elasticsearch cluster.

Alright. @RobH: can you look at what it would take to procure 3 additional nodes, similar to the recent ElasticSearch orders, but with hard drives instead?

Krinkle updated the task description. (Show Details)Feb 3 2015, 2:25 AM

Rejecting the temp allocation ticket, in favor of new task to procure three new hosts (T89402)

RobH added a parent task: T89402: purchase 3 additional logstash nodes.Feb 12 2015, 9:51 PM

bd808 moved this task from Backlog to Archive on the Wikimedia-Logstash board.Apr 22 2015, 3:25 AM

• Phabricator_maintenance added a project: MediaWiki-Debug-Logger.Jul 29 2016, 8:19 PM

fgiunchedi added a project: observability.Aug 19 2019, 2:29 PM

Allocate temporary Elasticsearch nodes from spares pool for LogstashClosed, DeclinedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Allocate temporary Elasticsearch nodes from spares pool for Logstash
Closed, DeclinedPublic
Actions

Related Objects
Search...