Page MenuHomePhabricator

Improve balance of nodes across rows for elasticsearch cluster eqiad
Closed, ResolvedPublic

Description

We realized that elasticsearch is not configured to balance shards across the different rows. To enable this and keep the shards balanced across nodes, we need to improve how nodes themselves are balanced across rows. In particular, we have 17 nodes on row D, which caused some level of pain when we lost networking on row D.

Current situation:

A 3: elastic10(30|31|32|33|34|35) - 6 nodes
B 3: elastic10(36|37|38|39) - 4 nodes
C 5: elastic10(40|41|42|43) - 4 nodes
D 3: elastic10(17|18|19|20|21|22) - 6 nodes
D 4: elastic10(23|24|25|26|27|28|29|44|45|46|47) - 11 nodes

Current master eligible: elastic10(30|36|40)
(masters have to be spread across different rows as well)

Procedure to move node around, to be done in 2 batches (4 nodes + 5 nodes):

  1. ban the nodes to be moved from the cluster: es-tool ban-node <IP_of_node_to_ban>
  2. move the nodes
  3. update regex.yaml with new row information
  4. update IP configuration (DNS, DHCP, ...) and documentation (racktables, ...) (we probably have a checklist somewhere, but I can't find it)
  5. preemptively ban the new IP of the nodes to prevent it joining the cluster before being reprovisioned
  6. reprovision the nodes
  7. unban all nodes from the cluster: es-tool unban-node <IP_of_node_to_ban>

Notes:

  • we definitely don't want to take more than 6 nodes out of the cluster at the same time
  • a rolling restart of the eqiad cluster is in progress, we need to wait for it to be done before moving servers around

Event Timeline

Gehel created this task.Aug 23 2016, 2:23 PM
Restricted Application added a subscriber: Southparkfan. · View Herald TranscriptAug 23 2016, 2:23 PM

@Cmjohnson I should have checked the row allocation better when we received the new elasticsearch server. Sorry for that.

Am I forgetting anything here? Is it possible to move those servers around?

faidon added a subscriber: faidon.Aug 23 2016, 2:57 PM

Am I forgetting anything here? Is it possible to move those servers around?

Sort of! Moving servers across rows means that their IP (& subnet) has to change, as the different rows are different broadcast domains/layer 3 subnets.

(thanks for working on this!)

Gehel added a comment.Aug 23 2016, 3:10 PM

Thanks @faidon! I suspect there is also some switch configuration, update to racktables, etc... but Chris knows more about this than I do. I'll update the description.

Gehel updated the task description. (Show Details)Aug 23 2016, 3:12 PM

@Gehel, yes, that is correct. Typically we reprovision servers when we move them around; you might want to automate this process a little bit, otherwise it will get even more cumbersome that it already is.

Do you have a proposal on how many servers should be moved and where? How do you want the eventual cluster to look with the minimum work involved? Are all of these equivalent in capacity and weight?

Gehel added a comment.Aug 23 2016, 4:29 PM

Reprovisionning might make sense (and is sufficiently automated on elasticsearch to be painless). I'll update the task description.

Logically, all nodes are equivalent (except master eligible, which have to be distributed over the different rows). elastic1017-1031 are older than elastic1032-1047, but they seem similar enough to me that we probably don't want to take this into account.

Ideally, I'd like to have the same number of servers in each row (8+8+8+7), so taking 9 servers out of row D and spreading them evenly across row A-B-C. We should do this in 2 steps (moving 4 servers, then moving 5).

Looking at racktables, it seems that we have enough space, but I'm not entirely sure I read this correctly. Or we might have other constraints.

Gehel updated the task description. (Show Details)Aug 23 2016, 4:33 PM
Gehel claimed this task.Aug 23 2016, 5:51 PM
Gehel moved this task from Needs triage to Ops on the Discovery board.
Gehel moved this task from needs triage to This Quarter on the Discovery-Search board.

I have space to add 3 elasticsearch servers to each of these racks
A6, B6 and C5

Please let me know if that works for you and how and when you want to start
moving them.
Chris

Gehel added a comment.Aug 25 2016, 9:41 AM

@Cmjohnson: so new arrangement could be:

A 3: 6 nodes
A 6: + 2 nodes
B 3: 4 nodes
B 6: + 3 nodes
C 5: 4 nodes + 3 nodes
D 3: 6 nodes - 2 nodes (= 4 nodes)
D 4: 11 nodes - 6 nodes (= 5 nodes)

By row:
A: 8 nodes
B: 7 nodes
C: 7 nodes (all in the same rack)
D: 9 nodes

That sounds good to me. We can ensure that all shards in rack C5 have replicas in other rows, so the single rack is not that much of an issue.

Can we start that around 9am SF time on Tuesday August 30? We can start with a first batch of 4 servers, and if all goes well, we'll see for the other batch of 4. If you tell me which servers you want to move, I'll make sure they are ready by that time.

@Gehel: that works for me, let me know which nodes you want to remove from
row D and where you want them. I assume just add 2 to row A 2 to row B
first?

0900 SF Time 30 Aug works for me

Thanks!
Chris

Can we find some other rack in C other than C5? We surely must have some other rack to put 3 servers?

I have 3 slots remaining in C4

Gehel added a comment.Aug 25 2016, 3:51 PM

So final arrangement (even better with not all servers in same rack on row C):

A 3: 6 nodes
A 6: + 2 nodes
B 3: 4 nodes
B 6: + 3 nodes
C 4: + 3 nodes
C 5: 4 nodes
D 3: 6 nodes - 2 nodes (= 4 nodes)
D 4: 11 nodes - 6 nodes (= 5 nodes)

By row:
A: 8 nodes
B: 7 nodes
C: 7 nodes
D: 9 nodes

Nodes in row D are equivalent to me. So arbitrarily, I'm choosing to move:

first batch (0900 SF Time 30 Aug):

  • elastic10(44|45) from D4 to A6
  • elastic10(46|47) from D4 to B6

second batch (time to be defined after first batch is done, but let's not wait too much):

  • elastic1028 from D4 to B6
  • elastic1029 from D4 to C4
  • elastic10(21|22) from D3 to C4

If by any chance there is one more slot in row B or C, we could move elastic1027 to there and have an almost perfectly balanced 8+8+8+7. But having 9 servers in row D does not really change much.

Mentioned in SAL [2016-08-30T11:47:52Z] <gehel> banning elastic10(44|45|46|47) from elasticsearch eqiad cluster - T143685

Mentioned in SAL [2016-08-30T16:58:39Z] <gehel> shutting down elasticsearch on elastic1047 to prepare moving server - T143685

Mentioned in SAL [2016-08-30T17:05:13Z] <gehel> shutting down elasticsearch on elastic1044 to prepare moving server - T143685

Mentioned in SAL [2016-08-30T17:08:32Z] <gehel> shutting down elasticsearch on elastic1045 to prepare moving server - T143685

Mentioned in SAL [2016-08-30T17:48:31Z] <gehel> shutting down elasticsearch on elastic1046 to prepare moving server - T143685

Mentioned in SAL [2016-08-30T20:16:23Z] <gehel> restarting ferm on elasticsearch eqiad cluster after reinstall of elastic104[4567] - T143685

elastic104[4-7] were moved to racks A6 and B6.
elastic104[4-6] installed, puppet and salt added no issues

elastic1047 is giving me an issue with partitions during install.

│                                                                         │
│ The installer can guide you through partitioning a disk (using          │
│ different standard schemes) or, if you prefer, you can do it            │
│ manually. With guided partitioning you will still have a chance later   │
│ to review and customise the results.                                    │
│                                                                         │
│ If you choose guided partitioning for an entire disk, you will next     │
│ be asked which disk should be used.                                     │
│                                                                         │
│ Partitioning method:                                                    │
│                                                                         │
│        Guided - resize RAID1 device #125 and use freed space            │
│        Guided - resize RAID1 device #126 and use freed space            │
│        Guided - reuse partition, RAID1 device #125            ▒         │
│        Guided - use entire disk                               ▒         │
│        Guided - use entire partition, RAID1 device #125                 │
│                                                                         │
│     <Go Back>                                                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
Gehel added a comment.Aug 31 2016, 7:50 AM

elastic1046 is still documented as in rack D4 in racktables. @Cmjohnson I assume that you have actually moved the server but just not updated racktables. Could you confirm?

Change 307700 had a related patch set uploaded (by Gehel):
elastic104[4567] moved to new racks

https://gerrit.wikimedia.org/r/307700

@Gehel confirmed and racktables updated

Change 307700 merged by Gehel:
elastic104[4567] moved to new racks

https://gerrit.wikimedia.org/r/307700

Mentioned in SAL [2016-08-31T11:10:03Z] <gehel> restarting elasticsearch104[456] to take new rack configuration into account - T143685

Change 307733 had a related patch set uploaded (by Gehel):
elastic102[1289] moved to new racks

https://gerrit.wikimedia.org/r/307733

Mentioned in SAL [2016-08-31T18:54:53Z] <gehel> shutting down elasticsearch on elastic1028 to prepare moving server - T143685

Mentioned in SAL [2016-09-01T12:12:32Z] <gehel> rolling restart of ferm on elasticsearch eqiad cluster to account for moved servers - T143685

Change 307733 merged by Gehel:
elastic102[1289] moved to new racks

https://gerrit.wikimedia.org/r/307733

debt closed this task as Resolved.Sep 9 2016, 8:05 PM