Maniphest T192771

mcrouter production architecture
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Joe
	Apr 23 2018, 12:19 PM

Description

We need to decide how to setup mcrouter in production.

The requirements are:

All memcached traffic must go through mcrouter, and be routed consistently
All cross-dc traffic (basically, the replication of SET and DELETE commands) must be encrypted
Mcrouter should ensure row-level redundancy

We're currently testing how mcrouter-mcrouter encrypted connections work, and it seems they only work with IP-based SANs, which are not currently supported by puppet, adding another level of complexity.

Strawman proposal

Install two mcrouter instances per appserver row, giving them proper CNAMEs, and make them communicate with:

the local memcacheds directly
the memcacheds in the other datacenter via SSL by means of the mcrouter machines in the other cluster.

I would prefer this to going the same way we went with nutcracker (installing the proxy on all appservers) because of the need for inter-dc connections and of the complexity in managing certificates with IP-based SANs (who don't allow wildcards).

Details

Subject	Repo	Branch	Lines +/-
mcrouter: fix hiera labels, install on mwdebug servers	operations/puppet	production	+17 -14
profile::mediawiki::mcrouter_wancache: update the ssl paths	operations/puppet	production	+16 -18
puppetmaster::frontend: add cergen-managed CA for mcrouter	operations/puppet	production	+151 -7
Add the ability to create CSRs with IP SANs as well	cergen	master	+31 -20

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Joe	T192370 Deploy mcrouter to production as a wancache backend
Resolved	Joe	T192771 mcrouter production architecture
Resolved	jbond	T194031 Setup a new PKI software as an alternative to the puppet CA for managing services certificates

Event Timeline

Joe created this task.Apr 23 2018, 12:19 PM

elukey subscribed.Apr 23 2018, 4:07 PM

• Imarlier moved this task from Inbox, needs triage to Radar on the Performance-Team board.Apr 23 2018, 8:09 PM

• Imarlier edited projects, added Performance-Team (Radar); removed Performance-Team.

Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.Apr 26 2018, 12:12 AM

MoritzMuehlenhoff subscribed.Apr 26 2018, 12:52 PM

After some consideration, I see three options moving forward:

Option A:

Mcrouter is installed on one memcached host per row, where the size of the memcached db will be somewhat reduced as a consequence (mcrouter uses some memory)
Possibly we want to cap mcrouter's memory usage, or even dedicate some servers to it; but it would seem like a waste of resources
Mediawiki gets configured to talk to nutcracker locally on each appserver, with hash function murmur (the fastest one) and use random distribution, with the dc-local mcrouter instances as backends
Mcrouter has all of the mc* hosts in the local dc as a local pool, and the remote mcrouters via SSL as a remote pool

~~Option B~~:
~~- set up something like stunnel on the memcached hosts, listening on port 11212~~
~~- Add mcrouter to every appserver, in place of twemproxy for memcached connections (we will still need twemproxy for redis)~~
~~- configure mcrouter to talk to the local dc via port 11211, and the remote dc via port 11212 with TLS~~

Option C:

We install mcrouter without ssl on all memcached hosts
Nutcracker on the appservers will load-balance across all those instances (as in option B)
Mcrouter is configured to talk to both pools, local and remote, unencrypted
We use IPSEC to secure communications cross-dc; It would mean every server would need to hold 18 ipsec connections.

Let's break down the pros and cons I see:

Option A:
Pros:

no indirection layer, SSL is supported natively
transition is relatively easy
Only one software on the appservers
Cross-dc unencrypted traffic is impossible

Cons:

much more complex than our current setup
Creates more serious points of failure, although it should

Option B:
Pros:

Very simple, it would be mostly straightforward to configure
The solution most resilient to single-machine failure
Cross-dc unencrypted traffic is impossible

Cons:

We rely on the availability of stunnel or something similar for cross-dc sets
We have to run nutcracker and mcrouter at the same time on the appservers

Option C:
Pros:

transition is relatively easy
Only one software on the appservers

Cons:

Cross-dc unencrypted traffic is possible; or -alternatively- we rely on the availability of the IPSEC tunnel for cross-dc sets

At the moment, my choice would be option C if we decide cross-dc unencrypted traffic is admissible for small periods of time, because of its relative simplicity, and option B otherwise, as it's the most straightforward and reasonable solution IMHO. I must say the only option that would surely work is option C, in the other cases we have the following risks:

in option A, we have to avoid circular replication on SET and DELETE commands. That might be doable via the preserve_route_prefix in mcrouter but it needs testing
in option B, we have to ensure that changing the server port doesn't change the hashing function in mcrouter. In that case, keys written from one DC would be unreadable in the other one.

I've an additional question, what is the expected behaviour in the following failure scenarios for each option?

One of the hosts that host mcrouter dies (and if there are differences on local DC vs remote DC)
The mcrouter process dies on one host (and if there are differences on local DC vs remote DC)

Vgutierrez subscribed.Apr 30 2018, 9:53 AM

In T192771#4167477, @Volans wrote:

I've an additional question, what is the expected behaviour in the following failure scenarios for each option?

One of the hosts that host mcrouter dies (and if there are differences on local DC vs remote DC)

The mcrouter process dies on one host (and if there are differences on local DC vs remote DC)

With option A and C, the failure scenario is basically that a single mcrouter failure should only affect a few requests for all appservers, while in option B the failure of one mcrouter would make all resquests from one appserver to fail, including all GETs, SETs and DELETEs, pretty much like it's now with twemproxy

After some tests:

Option B wouldn't work. Using a different host:port combination changes the host_id in consistent hashing so the keys end up on another host in the end.
Option A works, as long as certs include an IP-based SAN. It needs for a code tweak at the php level - we need to prepend /$dc/mw-wan/ or some other dc-specific prefix to keys we set and fetch to/from memcache from WANCache. This made me consider we could make the rollout of the new system per-wiki rather than per-percentage of traffic.
Option C works as well, and has the same necessities in terms of deployment as option A.

As things stand, I am considering a tweaked version of option A - install mcrouter on each appserver, and use N per DC as proxies. This will need us to have a solution for doing IP-based SANs automatically or something like that - basically a better CA than puppet. We could live with manual generation via cergen for the time being though.

Change 432977 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[cergen@master] Add the ability to create CSRs with IP SANs as well

https://gerrit.wikimedia.org/r/432977

gerritbot added a project: Patch-For-Review.May 14 2018, 11:36 AM

Joe added a project: User-Joe.May 14 2018, 12:19 PM

Joe moved this task from Backlog to Doing on the User-Joe board.May 14 2018, 1:36 PM

Change 432977 merged by jenkins-bot:
[cergen@master] Add the ability to create CSRs with IP SANs as well

https://gerrit.wikimedia.org/r/432977

Joe merged a task: T194225: Enable mcrouter on the memcached servers themselves.May 19 2018, 9:03 AM

Joe added a subscriber: aaron.

If SET/DELETE go to all mc* servers in the wancache-(eqiad/codfw) pools (as mediawiki_wancache is configured to do in puppet), then Option B would still work since the consistent hashing wouldn't matter. Having broadcasted operations go to all mc* servers rather than just 1-per-DC (based on hash) is not required for WANCache though. Keeping it this way wouldn't scale well if the rate of those (purge) operations increased hugely for some reason. I do like the conceptual simplicity though.

The hybrid proxy approach in https://gerrit.wikimedia.org/r/#/c/431737/4/hieradata/common/mcrouter.yaml seems reasonable to me. I assume the eventual choice of proxies would at some point no longer be MW appservers though.

The reason of the hybrid proxy approach is that mcrouter is known to use a non-insignificant amount of memory when under write pressure, so I wanted to avoid sharing the same machines as memcached itself.

We can for sure think of having global proxies in both datacenters in the future, but we can reassess at a later time.

In T192771#4223990, @Joe wrote:

The reason of the hybrid proxy approach is that mcrouter is known to use a non-insignificant amount of memory when under write pressure, so I wanted to avoid sharing the same machines as memcached itself.

We can for sure think of having global proxies in both datacenters in the future, but we can reassess at a later time.

Yeah, it seems fine for now.

Change 436240 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] utils: add script to generate mcrouter-related certs

https://gerrit.wikimedia.org/r/436240

Change 436240 merged by Giuseppe Lavagetto:
[operations/puppet@production] puppetmaster::frontend: add cergen-managed CA for mcrouter

https://gerrit.wikimedia.org/r/436240

Change 436531 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] profile::mediawiki::mcrouter_wancache: update the ssl paths

https://gerrit.wikimedia.org/r/436531

Change 436532 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] mcrouter: fix hiera labels, install on mwdebug servers

https://gerrit.wikimedia.org/r/436532

Change 436531 merged by Giuseppe Lavagetto:
[operations/puppet@production] profile::mediawiki::mcrouter_wancache: update the ssl paths

https://gerrit.wikimedia.org/r/436531

Change 436532 merged by Giuseppe Lavagetto:
[operations/puppet@production] mcrouter: fix hiera labels, install on mwdebug servers

https://gerrit.wikimedia.org/r/436532

Mcrouter is now installed across the fleet (minus the deployment servers), and I confirmed that replication works as expected:

Setting a key to the broadcast prefix /*/mw-wan does replicate

EQIAD-APPSERVER
set /*/mw-wan/test 1 0 12
Hello World2
STORED
get test
VALUE test 1 12
Hello World2
END

and subsequently reading the key in codfw succeeds; we also send a broadcasted delete

CODFW APPSERVER
get test
VALUE test 1 12
Hello World2
END
delete /*/mw-wan/test
DELETED
get test
END

which results in the key disappearing in the other datacenter

EQIAD APPSERVER
get test
END

Joe closed this task as Resolved.Jun 1 2018, 12:23 PM

aaron awarded a token.Jun 1 2018, 9:26 PM

jbond closed subtask T194031: Setup a new PKI software as an alternative to the puppet CA for managing services certificates as Resolved.May 27 2021, 4:31 PM

Maintenance_bot removed a project: Patch-For-Review.May 27 2021, 5:10 PM

mcrouter production architectureClosed, ResolvedPublicActions

Description

Strawman proposal

Details

Related ObjectsSearch...

Event Timeline

mcrouter production architecture
Closed, ResolvedPublic
Actions

Related Objects
Search...