Page MenuHomePhabricator

Only use newer (elastic10{16..47}) servers as master capable elasticsearch nodes
Closed, ResolvedPublic

Description

ES is currently configured with three master capable nodes: elastic1001, elastic1008 and elastic1013. This is one server in rack A3 and two servers in rack C5. With the new elastics1032+ hardware installed we should now be able to distribute master capability across any set of 3 machines, mainly it just needs to be moved away from the machines that are being removed.

Details

Related Gerrit Patches:

Event Timeline

EBernhardson assigned this task to chasemp.
EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a subscriber: EBernhardson.
Restricted Application added a project: Discovery. · View Herald TranscriptSep 14 2015, 6:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I doubt chase will do the actual re-racking of servers, but i've assigned this for now because i'm not sure how to move this forward.

EBernhardson updated the task description. (Show Details)Sep 14 2015, 6:05 PM
EBernhardson set Security to None.
EBernhardson updated the task description. (Show Details)Sep 14 2015, 6:17 PM
Deskana triaged this task as Medium priority.Sep 14 2015, 8:15 PM
Deskana added a subscriber: Deskana.

Removing this from the Discovery-Search (Current work) as @EBernhardson and I don't believe this is actionable on our end. Let me know if this is incorrect.

EBernhardson updated the task description. (Show Details)Sep 14 2015, 8:42 PM
Restricted Application added a subscriber: Matanya. · View Herald TranscriptSep 18 2015, 2:39 PM
chasemp removed chasemp as the assignee of this task.Sep 18 2015, 2:40 PM
chasemp added a subscriber: chasemp.

I am not going to get to this within the week or even next week. Placing up for grabs to reflect that. Hopefully we can knock this out in a few weeks but it's a somewhat careful and time-consuming operation at this moment.

Change 251024 had a related patch set uploaded (by EBernhardson):
Make three of the newer ES nodes master eligable

https://gerrit.wikimedia.org/r/251024

Change 251025 had a related patch set uploaded (by EBernhardson):
Remove old ES nodes from master capable list

https://gerrit.wikimedia.org/r/251025

It seems this fell by the wayside as chase and I were both otherwise occupied. I think we should be able to merge the first patch above, reboot each of the instances, then merge the second and reboot the three old instances.

Is there a disadvantage to having 4 eligible masters? I know we have a minimum viability setting right now of 2 which means if we do maint on 1 eligible we are in a real danger zone. We haven't been paying this much heed and I'm wondering if we shouldn't just make

1030
1031
1024
1025

all eligible (specifics negotiable)

EBernhardson added a comment.EditedNov 5 2015, 11:47 PM

With four nodes we will need to increase discovery.zen.minimum_master_nodes to 3 to ensure there is no split brain. Basically this is no better than having three master capable nodes. We could go to 5 with a minimum of 3 i suppose. I can't think of any reason in particular that would be bad,

Deskana moved this task from Needs triage to Ops on the Discovery board.Dec 31 2015, 5:14 AM

According to racktables, we have new elasticsearch hardware in row A and D, but not in row C. The servers which are planned to be the new eligible masters (1020, 1030, 1031 - see https://gerrit.wikimedia.org/r/#/c/251024/) are in D/D3, A/A3, A/A3. Let's wait for new elasticsearch servers (T128000), rack them in rows which make sense and switch the eligible masters only then...

Restricted Application added a project: Discovery-Search. · View Herald TranscriptApr 11 2016, 12:59 PM

Change 251024 abandoned by EBernhardson:
Make three of the newer ES nodes master eligable

Reason:
with these servers being replaced, this patch is no longer necessary

https://gerrit.wikimedia.org/r/251024

Change 251025 abandoned by EBernhardson:
Remove old ES nodes from master capable list

Reason:
the old nodes are all being replaced, this patch is no longer necessary

https://gerrit.wikimedia.org/r/251025

EBernhardson closed this task as Declined.Jun 9 2016, 6:33 PM

These nodes are being removed from the cluster, as they have out lived their warranty. The new nodes are a closer match to the rest of the cluster making this unnecessary.

EBernhardson reopened this task as Open.Jun 20 2016, 3:39 PM
EBernhardson updated the task description. (Show Details)
EBernhardson renamed this task from Only use newer (elastic10{16..31}) servers as master capable elasticsearch nodes to Only use newer (elastic10{16..47}) servers as master capable elasticsearch nodes.Jun 20 2016, 3:41 PM
Gehel added a comment.Jun 29 2016, 9:21 AM

This is also partly tracked as part of T138329. Masters have moved to new servers (elastic1030, 1036 and 1040). Cluster restart is in progress discovery will now use those new servers, but this actually more related to the decommission of elastic1001-1016 than the move of masters. We can consider this as resolved.

debt closed this task as Resolved.Jul 21 2016, 6:09 PM