Page MenuHomePhabricator

etcd switchover/enhancements
Open, MediumPublic

Description

In order for etcd to be more up to our current goals, we want to do the following:

  • Set up a MirrorMaker-like replica
  • Switchover to codfw
  • Re-configure the eqiad cluster to use the tls proxy
  • Allow reading from the nearest datacenter (optional)

here are my ideas for this:

MirrorMaker-like replica

Temporary "emergency" switchover

We need to switch over to the codfw cluster because of a time-sensitive maintenance in eqiad, so we need to do as follows for now:

  • Verify the replica from eqiad to codfw is currently correctly set up to '/conftool'
  • Add a second SRV record for etcd.rw or something similar that can be used by conftool, so that all writes can be managed that way
  • Reconfigure conftool to use it
  • Reduce the TTL of all SRV records
  • Switch the configurations of a few pybal hosts to use codfw, possibly just the backups; verify data after the restart are the same with the other element of the couple
  • Switch the other clients (pybals included) by changing the SRV records for everything but conftool. verify it actively removes the connections from those hosts to eqiad
  • Switch the record for conftool too
  • Stop the etcd replication eqiad => codfw
  • Re-raise the TTL of all SRV records

A longer-term plan

At the moment, we're interested only in replicating conftool data. I have given some thought about this and came to the conclusion that the best course of action is the following:

  1. Copy the data currently in /conftool in eqiad under /eqiad.wmnet/conftool by starting a replica
  2. Add to puppet a etcd_masterdc variable, and have the conftool_prefix hiera variable depend on that. This will make conftool/confd read/write to this new directory.
  3. Once puppet is ran everywhere, all reads/writes will go to the new
  4. replicate this tree to codfw 1:1
  5. On codfw, create /codfw.wmnet/conftool and replicate it 1:1 to eqiad

We will configure everything to just read/write to eqiad for now, and given conftool is not able to write to multiple clusters, this can seem not so useful, but it helps with our next goals, as we should see. Also, if we decide to change the way conftool works and allow multi-dc writes, we can benefit from this.

Allow reading from the nearest datacenter

Once we've defined etcd_masterdc in puppet, we will be able to make servers in the various DCs to read from the nearest available datacenter under the correct index, by simply changing the SRV records in the DNS. It would be wise to introduce different DNS records for reads and writes, so that conftool will always connect to the master. I think this could even come from discovery, but the level of etcdinception would make me uncomfortable. So, manual records for now!

Switchover to codfw

Whenever we want to switchover, the steps will be:

  1. set up a second, temporary local replication in codfw from /eqiad.wmnet/ to /codfw.wmnet
  2. Change the etcd_masterdc variable and run puppet everywhere it matters
  3. We stop the temporary replication

Re-configure the eqiad cluster to use TLS proxy

We will need to do the following:

  1. Ensure nothing reads from eqiad by changing DNS/other configs
  2. Prepare the new ECDSA certs, and commit them to puppet
  3. Disable auth (can be done via etcd-manage)
  4. Disable puppet across the config cluster in eqiad
  5. Switch the conf1* servers to use role::configcluster
  6. One machine at a time, stop etcd, run puppet, verify the server has reconnected to its current cluster.

Event Timeline

Joe created this task.Mar 6 2017, 10:38 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 6 2017, 10:38 AM
Joe added a comment.Mar 6 2017, 10:52 AM

Just to give some context: it might be possible to try to have a true multi-dc cluster for etcd, but that will need:

  • N machines in eqiad
  • N machines in codfw
  • 1 or 2 tiebreakers, probably in ULSFO, for accounting for intra-dc network partitions

it will also need some fine tuning and extensive testing, because I suspect raft over large latencies can be pretty demanding in terms of write latencies too.

I am willing to give it a chance but it will take time and effort to be tested; and probably a move to etcdv3 would be a good idea in that case. I have more short-term goals in mind at the moment, like having a relatively easy to achieve active-active read, active-passive write configuration.

Joe triaged this task as Medium priority.Mar 6 2017, 11:50 AM
Joe added a project: User-Joe.
Joe moved this task from Backlog to Doing on the User-Joe board.Mar 8 2017, 3:13 PM

Change 341989 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/puppet] conftool: switch prefix to /eqiad.wmnet/conftool

https://gerrit.wikimedia.org/r/341989

Joe moved this task from Doing to Backlog on the User-Joe board.Apr 3 2017, 6:42 AM
Joe updated the task description. (Show Details)Apr 20 2017, 10:47 AM
Joe moved this task from Backlog to Doing on the User-Joe board.Apr 20 2017, 10:50 AM
Volans added a subscriber: Volans.Apr 20 2017, 11:54 AM

Change 349380 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/dns@master] Add separated SRV records for etcd to consume for conftool

https://gerrit.wikimedia.org/r/349380

Change 349385 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/puppet@production] role::configcluster: reconfigure etcd replication

https://gerrit.wikimedia.org/r/349385

Change 349386 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/puppet@production] etcd: make our rw clients use the new SRV record

https://gerrit.wikimedia.org/r/349386

Change 349385 merged by Giuseppe Lavagetto:
[operations/puppet@production] role::configcluster: reconfigure etcd replication

https://gerrit.wikimedia.org/r/349385

Joe updated the task description. (Show Details)Apr 21 2017, 6:18 PM
Joe added a comment.Apr 21 2017, 6:20 PM

I've set up the replica and prepared changes for most next steps. When I'm back on Wednesday morning, we can decide if we want to failover to the new cluster directly or just do it in case something bad happens with the network maintenance and the eqiad cluster, and perform the switchover at a later date.

This is still needed to move the eqiad cluster away from its current setup, where auth is enabled at the etcd layer, where it's expensive and we want to avoid that.

Change 349380 merged by Alexandros Kosiaris:
[operations/dns@master] Add separated SRV records for etcd to consume for conftool

https://gerrit.wikimedia.org/r/349380

Change 349386 merged by Alexandros Kosiaris:
[operations/puppet@production] etcd: make our rw clients use the new SRV record

https://gerrit.wikimedia.org/r/349386

akosiaris updated the task description. (Show Details)Apr 25 2017, 1:30 PM

Change 350204 had a related patch set uploaded (by Alexandros Kosiaris):
[operations/puppet@production] Use conf2001 for secondary eqiad LVS's pybal

https://gerrit.wikimedia.org/r/350204

Change 350204 merged by Alexandros Kosiaris:
[operations/puppet@production] Use conf2001 for secondary eqiad LVS's pybal

https://gerrit.wikimedia.org/r/350204

lvs1004, lvs1005, lvs1006 now use conf2001 per the patch above successfully. Proceeding with the rest of the plan

akosiaris updated the task description. (Show Details)Apr 25 2017, 1:57 PM

Change 350212 had a related patch set uploaded (by Alexandros Kosiaris):
[operations/dns@master] Lower TTL for etcd client records

https://gerrit.wikimedia.org/r/350212

Change 350216 had a related patch set uploaded (by Alexandros Kosiaris):
[operations/dns@master] Switch conftool etcd records to codfw

https://gerrit.wikimedia.org/r/350216

Change 350223 had a related patch set uploaded (by Alexandros Kosiaris):
[operations/puppet@production] Switch all pybals to using codfw etcd cluster

https://gerrit.wikimedia.org/r/350223

Change 350225 had a related patch set uploaded (by Alexandros Kosiaris):
[operations/dns@master] Increase TTL for etcd client records

https://gerrit.wikimedia.org/r/350225

Change 350212 merged by Alexandros Kosiaris:
[operations/dns@master] Lower TTL for etcd client records

https://gerrit.wikimedia.org/r/350212

Change 350223 merged by Alexandros Kosiaris:
[operations/puppet@production] Switch all pybals to using codfw etcd cluster

https://gerrit.wikimedia.org/r/350223

Change 350214 had a related patch set uploaded (by Alexandros Kosiaris):
[operations/dns@master] Swap etcd client records to point to codfw

https://gerrit.wikimedia.org/r/350214

Mentioned in SAL (#wikimedia-operations) [2017-04-25T15:33:15Z] <akosiaris> restart pybal on lvs[2004-2006].codfw.wmnet,lvs3004.esams.wmnet,lvs4004.ulsfo.wmnet,lvs[1004-1006].wikimedia.org T159687

Mentioned in SAL (#wikimedia-operations) [2017-04-25T15:47:34Z] <akosiaris> restart pybal on lvs2003.codfw.wmnet,lvs3003.esams.wmnet,lvs4003.ulsfo.wmnet,lvs1003.wikimedia.org T159687

Mentioned in SAL (#wikimedia-operations) [2017-04-25T15:59:30Z] <akosiaris> restart pybal on lvs[2001-2002].codfw.wmnet,lvs[3001-3002].esams.wmnet,lvs[4001-4002].ulsfo.wmnet,lvs[1001-1002].wikimedia.org T159687

Change 350214 merged by Alexandros Kosiaris:
[operations/dns@master] Swap etcd client records to point to codfw

https://gerrit.wikimedia.org/r/350214

akosiaris updated the task description. (Show Details)Apr 25 2017, 4:46 PM

I 've restarted confd across the fleet after merging the DNS change above in order for it to be picked up by the daemons (5mins had passed and I saw no difference in the number of ESTABLISHED connections in lsof output).

A quick test with mw2255 pooling and depooling verified that everything continues to work fine.

I 've left the part of changing conftool DNS records and stopping the replication for tomorrow morning

Change 350365 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/dns@master] Reset TTL on etcd RO client record, lower it on RW ones

https://gerrit.wikimedia.org/r/350365

Change 350366 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/dns@master] Switch etcd records to codfw

https://gerrit.wikimedia.org/r/350366

Change 350367 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/dns@master] Restore TTL for RW etcd records

https://gerrit.wikimedia.org/r/350367

Change 350365 merged by Giuseppe Lavagetto:
[operations/dns@master] Reset TTL on etcd RO client record, lower it on RW ones

https://gerrit.wikimedia.org/r/350365

Change 350368 had a related patch set uploaded (by Giuseppe Lavagetto):
[operations/puppet@production] role::configcluster: stop replicating to codfw for etcd

https://gerrit.wikimedia.org/r/350368

Change 350366 merged by Giuseppe Lavagetto:
[operations/dns@master] Switch etcd records to codfw

https://gerrit.wikimedia.org/r/350366

Joe updated the task description. (Show Details)Apr 26 2017, 6:35 AM

Change 350368 merged by Giuseppe Lavagetto:
[operations/puppet@production] role::configcluster: stop replicating to codfw for etcd

https://gerrit.wikimedia.org/r/350368

Change 350367 merged by Giuseppe Lavagetto:
[operations/dns@master] Restore TTL for RW etcd records

https://gerrit.wikimedia.org/r/350367

Joe updated the task description. (Show Details)Apr 26 2017, 6:58 AM
Joe added a comment.Apr 26 2017, 7:00 AM

All clients have been successfully switched to codfw, and replication has been stopped; I tested depooling and pooling back a client (to test again that nginx-based auth works) and everything seems working flawlessly for now.

I'll start working ASAP on moving conf1001-1003 to role::configcluster and drop the builtin auth module of etcd.

Mentioned in SAL (#wikimedia-operations) [2017-05-02T06:46:29Z] <_joe_> disabling etcd auth on conf1*, converting to use nginx for TLS/auth T159687

Joe added a comment.May 2 2017, 7:58 AM

I converted the etcd cluster in eqiad to use nginx for auth/TLS, moved to ecdsa certs with the correct SANs, and started replication codfw => eqiad.

I might start to make clients read from eqiad in a few. That would basically resolve the initial purpose of this ticket, but it's still a bit short of an ideal or even good situation.

Specifically I want to work on etcdmirror so that it can do the following:

  • Have a mode in which if no replication data is available, it can start fresh automatically (for cluster bootstrap)
  • Allow to read a defaults file where to add (manually) reload commands / etc
  • Make etcdmirror read on the SOURCE cluster if that cluster is active for replication; if not, just do nothing.

Change 351257 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::etcd::tlsproxy: turn off proxy buffering

https://gerrit.wikimedia.org/r/351257

Change 351257 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::etcd::tlsproxy: turn off proxy buffering

https://gerrit.wikimedia.org/r/351257

Change 353231 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::etcd::tlsproxy: allow read-only mode

https://gerrit.wikimedia.org/r/353231

Joe moved this task from Doing to Backlog on the User-Joe board.May 15 2017, 9:49 AM

Change 353231 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::etcd::tlsproxy: allow read-only mode

https://gerrit.wikimedia.org/r/353231

Change 350225 abandoned by Alexandros Kosiaris:
Increase TTL for etcd client records

Reason:
No longer relevant

https://gerrit.wikimedia.org/r/350225

Change 350216 abandoned by Alexandros Kosiaris:
Switch conftool etcd records to codfw

Reason:
No longer relevant

https://gerrit.wikimedia.org/r/350216

Change 341989 abandoned by Giuseppe Lavagetto:
conftool: switch prefix to /eqiad.wmnet/conftool

Reason:
we went in another direction.

https://gerrit.wikimedia.org/r/341989