Page MenuHomePhabricator

eqiad: (3) servers for logstash service
Closed, ResolvedPublic

Description

The Logstash cluster currently consists of three 'misc'-class nodes, to which have been added 2x3tb disks in raid0. These machines were allocated while the service was in development, and disks were added as the service was adopted and volume increased. However we are quickly outgrowing the current config as more event sources are added. RAM appears to be the current bottleneck.

Now that Logstash has proved its usefulness, let's purchase equipment explicitly spec'd for the task with sufficient capacity for future growth.

Factors:

  • Elasticsearch replicates data between nodes, therefore redundant storage is not required for nodes. For comparison, elastic10xx nodes also use raid0
  • Logstash nodes have 16GB RAM and have had trouble with Elasticsearch OOM failures. For comparison, elastic10xx nodes have 96GB RAM.
  • Currently the daily Logstash indices are ~30GB/day, stored for 30 days = 900GB storage required per node
  • It is unclear how much our storage requirements will increase based on desired additional logging, or even whether all potential log sources have been enumerated.

For scalability and clarity, we may wish to divide the service among nodes by task:

    • Redis nodes for an input message queue
    • Logstash nodes for ingest
    • Elasticsearch master nodes for cluster management which store no data
    • Elasticsearch client nodes for data storage
    • Kibana web frontend service nodes
  • Some roles might be combined onto a common set of hosts, such as Redis + Logstash

Current event sources:

  • Apache2
  • HHVM
  • Mediawiki
  • scap
  • Job queue runner
  • Hadoop
  • OCG
  • Parsoid

Future sources

  • Syslog from all nodes (includes Puppet)
  • Zookeeper (T84908)
  • Kafka (T84907)
  • Monolog (T76759)
  • Icinga
  • Any local logs which may be tailed
  • ?

I request help from others to identify all additional event sources we wish to store, and to determine appropriate host stats and number of hosts.

Related Objects

Event Timeline

Gage raised the priority of this task from to Needs Triage.
Gage updated the task description. (Show Details)
Gage changed Security from none to None.
Gage added subscribers: Gage, bd808, Manybubbles, mark.

From past experience, once you start aggregating logs in a useful way you will continually find more event sources to add. I think we should plan on a system that can be grown horizontally as we find more things we want to do with it. Luckily, this should be pretty easy. Any number of Logstash agents can be used to feed data to the backing Elasticsearch cluster and Elasticsearch itself is designed to scale by adding nodes and increasing shard count on indices.

Separating Elasticsearch from Logstash + redis + apache would be a good start. The existing misc boxes should be able to handle ingesting a lot of log traffic if relieved from their duty of indexing the resulting documents and delivering search results.

Here's my proposal:

  • Add 3 hosts matching the machine spec used for the MediaWiki Elasticsearch cluster
  • Move the Logstash Elasticsearch backend to those hosts
  • Keep the current logstash100[123] boxes to run the rest of the stack (Apache + Redis + Logstash)

Scale the system by:

  • Adding Elasticsearch nodes if and when Elasticsearch becoes a bottleneck on the 3 node cluster (shard daily indices across more than 1 shard, distribute across more hosts)
  • Separating Redis and Logstash to separate machines if there is cpu or ram contention on those host between the two processes
  • Add more hosts running logstash if and when needed due to cpu consumed by logstash processing

Unanswered questions:

  • Do we need Logstash processing cluster in codfw or can we ship logs direct to eqiad?
  • Do we need Logstash Elasticsearch cluster in codfw as either replica of combined data from both DCs or as a codfw only log storage system?
  • Do we need a separate "ops only" Elasticsearch cluster for log storage or can we redact or partition the data in the same cluster.

Here's my proposal:

  • Add 3 hosts matching the machine spec used for the MediaWiki Elasticsearch cluster

Does it make sense to use spinning disks rather than SSDs? High indexing rate loves SSDs but keeping lots of data around likes the bigger rotating disks. What about RAM?

Here's my proposal:

  • Add 3 hosts matching the machine spec used for the MediaWiki Elasticsearch cluster

Does it make sense to use spinning disks rather than SSDs? High indexing rate loves SSDs but keeping lots of data around likes the bigger rotating disks. What about RAM?

At this moment, the base size of the data set is 990G for the last 30 days. With the growth that has been seen as a result of the MediaWiki conversion to Monolog+Redis by mid-month in February that base storage may grow to something like 1.5T. I would like to see each index have 2 replicas (3 total copies) available across the backing Elasticsearch cluster so with 3 nodes we would want 1.5T plus a healthy growth factor available for index storage.

I personally think we will be fine with spinning disks for IOPS. If at some point in the growth of the logstash cluster we really need to be able to ingest messages at a rate where disk IOPS become a problem we can start sharding the daily index and if we get to a point where that doesn't help we can add a set of nodes with SSD to hold the current day's index and then use index routing in Elastcisearch itself to move the index from nodes using SSD to nodes using spinning disks each day.

Ram I think is a more the merrier thing. If we are using a 30G fixed heap then we should have 45-60G available for the JVM plus file system cache for the index files plus more for OS overhead.

RobH subscribed.

There is currently ongoing discussion on https://rt.wikimedia.org/Ticket/Display.html?id=9199 in regards to the disks being SATA or SAS.

RobH renamed this task from Production hardware for Logstash service to eqiad: (3) servers for logstash service.Feb 26 2015, 11:39 PM
RobH changed the task status from Open to Stalled.
RobH claimed this task.
RobH raised the priority of this task from Medium to High.EditedMar 10 2015, 7:34 PM

I didn't follow up on the discussion until today, which included the potential upgrade to SAS disks to match current standards.

As such, the new quote request is in on the associated RT ticket, I expect we'll have movement on this task within the next 24 hours.

Edit addition: There was movement, still is. Quote options are being discussed on RT ticket.

The order for these systems has been placed on https://rt.wikimedia.org/Ticket/Display.html?id=9199. The ETA as of now is 2015-04-24 (Friday) for delivery. We'll need a day or two to receive them in and get them racked and ready for installation & then installed. I wouldn't expect to see these ready for service implementation sooner than the 30th. (It could be faster of course, but I don't like to promise too short a timeline.)

Resolving this request, as the servers were ordered and are now on-site being setup.

Setup of systems can be followed on T96692.