ES is currently configured with three master capable nodes: elastic1001, elastic1008 and elastic1013. This is one server in rack A3 and two servers in rack C5. With the new elastics1032+ hardware installed we should now be able to distribute master capability across any set of 3 machines, mainly it just needs to be moved away from the machines that are being removed.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Gehel | T112556 Only use newer (elastic10{16..47}) servers as master capable elasticsearch nodes | |||
Resolved | • chasemp | T112559 Swap two elasticsearch servers in row D with an elasticsearch server in racks A3 and C5. | |||
Resolved | RobH | T128000 Refresh elastic10{01..16}.eqiad.wmnet servers | |||
Unknown Object (Task) | |||||
Unknown Object (Task) | |||||
Unknown Object (Task) |
Event Timeline
I doubt chase will do the actual re-racking of servers, but i've assigned this for now because i'm not sure how to move this forward.
Removing this from the Discovery-Search (Current work) as @EBernhardson and I don't believe this is actionable on our end. Let me know if this is incorrect.
I am not going to get to this within the week or even next week. Placing up for grabs to reflect that. Hopefully we can knock this out in a few weeks but it's a somewhat careful and time-consuming operation at this moment.
Change 251024 had a related patch set uploaded (by EBernhardson):
Make three of the newer ES nodes master eligable
Change 251025 had a related patch set uploaded (by EBernhardson):
Remove old ES nodes from master capable list
It seems this fell by the wayside as chase and I were both otherwise occupied. I think we should be able to merge the first patch above, reboot each of the instances, then merge the second and reboot the three old instances.
Is there a disadvantage to having 4 eligible masters? I know we have a minimum viability setting right now of 2 which means if we do maint on 1 eligible we are in a real danger zone. We haven't been paying this much heed and I'm wondering if we shouldn't just make
1030
1031
1024
1025
all eligible (specifics negotiable)
With four nodes we will need to increase discovery.zen.minimum_master_nodes to 3 to ensure there is no split brain. Basically this is no better than having three master capable nodes. We could go to 5 with a minimum of 3 i suppose. I can't think of any reason in particular that would be bad,
According to racktables, we have new elasticsearch hardware in row A and D, but not in row C. The servers which are planned to be the new eligible masters (1020, 1030, 1031 - see https://gerrit.wikimedia.org/r/#/c/251024/) are in D/D3, A/A3, A/A3. Let's wait for new elasticsearch servers (T128000), rack them in rows which make sense and switch the eligible masters only then...
Change 251024 abandoned by EBernhardson:
Make three of the newer ES nodes master eligable
Reason:
with these servers being replaced, this patch is no longer necessary
Change 251025 abandoned by EBernhardson:
Remove old ES nodes from master capable list
Reason:
the old nodes are all being replaced, this patch is no longer necessary
These nodes are being removed from the cluster, as they have out lived their warranty. The new nodes are a closer match to the rest of the cluster making this unnecessary.
This is also partly tracked as part of T138329. Masters have moved to new servers (elastic1030, 1036 and 1040). Cluster restart is in progress discovery will now use those new servers, but this actually more related to the decommission of elastic1001-1016 than the move of masters. We can consider this as resolved.