Page MenuHomePhabricator

Investigate the need for master only (non data nodes) in our ES cluster
Closed, DeclinedPublic

Description

We may have issues with the master being able to keep up with it's duties while handling both roles. This has been brought up in the past and could do with more investigation.

https://wikitech.wikimedia.org/wiki/Incident_documentation/20150615-Elasticsearch

Status:    In progress Improve ES reliability with an architecture change? Related commit: https://gerrit.wikimedia.org/r/218421

If we're going to have master only nodes lets not use our huge data nodes for that. I hope this helps with stability shit.

Query nodes probably won't help us - the kinds of issues we're hitting probably won't be hit on the query nodes - only the data nodes will fail and the query nodes will do nothing.

So I'm not against doing the three way split - I just think we should do the right thing by hardware for it.

Event Timeline

chasemp raised the priority of this task from to Medium.
chasemp updated the task description. (Show Details)
chasemp added subscribers: dcausse, Aklapper, chasemp.

I believe the master/data nodes split has been done eventually, @Gehel might know for sure

Actually this has not been done yet. I'm waiting for the new elasticsearch servers to try to dedicate some of them to master only. With the way we split indices into shards at the moment, we can't really decrease the number of data nodes.

dedicating an entire node to master duties might be a bit much, especially since 2 of the 3 are only on standby. Another approach i've seen mentioned is to have a dedicated instance of elasticsearch on a few machines. They can still have a primary instance doing queries/indexing/etc.