Page MenuHomePhabricator

Swap two elasticsearch servers in row D with an elasticsearch server in racks A3 and C5.
Closed, ResolvedPublic

Description

The ES cluster has two sets of hardware, old hardware in racks A3 and C5 and new hardware in racks D3 and D4. We would like to move two servers from the D row such that we have at least one server with new hardware in each row for usage as the cluster master node.

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a project: acl*sre-team.
EBernhardson subscribed.

Any particular servers you would like move or just take 2 that makes the most sense?

I am thinking
elastic1031 => A3
elastic1030 => A3
elastic1006 => D4
elastic1005 => D4

In terms of exact servers, whichever makes the most sense.I would like to see servers moved into both A and C racks for availability reasons, three masters capable nodes in three rows seems best.

Any particular servers you would like move or just take 2 that makes the most sense?

I am thinking
elastic1031 => A3
elastic1030 => A3
elastic1006 => D4
elastic1005 => D4

Hey chris! There a few that are more sensitive (can't be missing at the same time), and I'm hoping we can do it in 2 phases (so it's only 2 missing at a time). I'll try to sync up with discovery gents and further comment.

Thanks!

For the moment none of these three can be missing at the same time:

hieradata/hosts/elastic1001.yaml:elasticsearch::master_eligible: true
hieradata/hosts/elastic1008.yaml:elasticsearch::master_eligible: true
hieradata/hosts/elastic1013.yaml:elasticsearch::master_eligible: true

which seems compatible with the thoughts here I just wanted to be sure :)

Also I think this will need to be updated at this time:

hieradata/regex.yaml

es_rack_a3:

__regex: !ruby/regexp /^elastic100[0-6]\.eqiad\.wmnet$/
elasticsearch::rack: A3
elasticsearch::row:  A

es_rack_c5:

__regex: !ruby/regexp /^elastic10(0[7-9]|1[0-2])\.eqiad\.wmnet$/
elasticsearch::rack: C5
elasticsearch::row:  C

es_rack_d3:

__regex: !ruby/regexp /^elastic10(1[3-9]|2[0-2])\.eqiad\.wmnet$/
elasticsearch::rack: D3
elasticsearch::row:  D

es_rack_d4:

__regex: !ruby/regexp /^elastic10(2[3-9]|3[01])\.eqiad\.wmnet$/
elasticsearch::rack: D4
elasticsearch::row:  D
Dzahn triaged this task as Medium priority.Sep 14 2015, 11:09 PM
Dzahn subscribed.

We made a plan to do 1030 and 1005 tomorrow and then let thing stabilize before going further. We want to get started at 10:30 am eastern

@dcausse @EBernhardson @Cmjohnson

I will ban these nodes and remove from LVS today.

Moved elastic1030 to row A3
Moved elastic1005 to row D4

Racktables has been updated.

next window is planned for thursday noon EST

Change 242082 had a related patch set uploaded (by Giuseppe Lavagetto):
elasticsearch: re-enter rack info for elastic1006

https://gerrit.wikimedia.org/r/242082

Change 242082 merged by Rush:
elasticsearch: re-enter rack info for elastic1006

https://gerrit.wikimedia.org/r/242082

Relocated elastic1006/1031 to appropriate racks, updated switch cfg, racktables. Corrected DNS and updated /etc/network/interfaces information on each server. Both are reachable via ssh. The on-site portion of this project has been completed.