Memcached, mcrouter in MediaWiki on Kubernetes
Open, MediumPublic
Actions

Assigned To

Authored By

	jijiki
	Mar 17 2021, 9:58 PM

Description

What?
Memcached is a crucial component in our Mediawiki installation. Right now MediaWiki interacts (via mcrouter) with two memcached instances.

Onhost, an instance running in with in the same application server where specific keys are stored with a max TTL of 10s
Main memcached cluster, a set of 18 servers per DC where mcrouter shards all keys.

Nutcracker is a component we would like to relieve of its duties. Nutcracker shards keys to a redis cluster that resides with in the memcached servers. Whatever solution we choose for mcrouter, makes sense to use for nutcracker too, if nutcracker is still around. T277183 Nutcracker is not with us anymore

Onhost memached
It makes sense to have one onhost memcached instance running in each node where MediaWiki pods will run:

either outside k8s, as a service running on a node
as a daemonset
not use it at all

Where should mcrouter reside?

1) Run mcrouter as part of the mediawiki pod (via a TCP port or a UNIX socket)

Prons: rollout will be similar to mediawiki
Cons: harder to test changes in production (eg push a change o a single node and monitor), # of connections towards the memcached main cluster will multiply

2) Run mcrouter as daemonset

prons: easy rollout
cons: some unavailability might happen during rollout , if the damonset fails, all pods in the node are left without access to the memcached main cluster

3) Run mcrouter outside of kubernetes, on the node

Prons: reduces complexity, we already have everything setup in puppet, easy rollout, easy to test changing in production, easy to control changes (by eg using feature flags in puppet)
Cons: if the service fails, all pods in the node are left without access to the memcached main cluster (almost unusuable)

Details

Other Assignee: Joe

Related Objects
Search...

Status	Subtype	Assigned	Task
Stalled		None	T255792 Quibble runs core:unit tests twice!
Open		None	T328919 Upgrade to PHPUnit 10
Open		None	T338103 Micro-optimize ApiResult::isMetadataKey with str_starts_with once we support PHP8+
Open		None	T328921 Drop PHP 7.4 support from MediaWiki
Stalled		None	T334726 Use return type `never` in Wikibase
Open		None	T328922 Drop PHP 8.0 support from MediaWiki
Stalled		None	T319055 Upgrade to psr/container 2.x
Stalled	Feature	None	T364249 New upstream release for Pygments (2.18.0)
Stalled		Krinkle	T319432 Migrate WMF production from PHP 7.4 to PHP 8.1
Open		None	T291916 Tracking task for Bullseye migrations in production
Stalled		None	T356293 Migrate MW appservers' base images to bullseye
Open		None	T290536 Serve production traffic via Kubernetes
Open		jijiki	T277711 Memcached, mcrouter in MediaWiki on Kubernetes
Resolved		Joe	T278220 Define the size of a pod for mediawiki in terms of resource usage
Stalled		jijiki	T346690 mcrouter daemonset on mw-on-k8s
Open		None	T363186 Cache mw-mcrouter service ClusterIP in apcu cache

Event Timeline

jijiki created this task.Mar 17 2021, 9:58 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 17 2021, 9:58 PM

jijiki triaged this task as Medium priority.Mar 17 2021, 9:59 PM

jijiki added projects: SRE, serviceops.

Krinkle updated the task description. (Show Details)Mar 17 2021, 10:21 PM

jijiki updated the task description. (Show Details)Mar 18 2021, 5:44 PM

As far as mcrouter goes, the only non-brittle solution is to run it inside the pod, so solution 1. The reason is simple: restarting mcrouter and/or it crashing on the node or in a daemonset would make it unavailable for all the pods on the node, without MediaWiki *or* kubernetes noticing.

If we keep it in the pod, we have a few advantages:

we reduce the amount of requests a single mcrouter instance has to deal with. We've seen that during peak traffic latencies of mcrouter can increase as we're stretching its ability to scale as a single instance
Zero-downtime config changes (they become k8s deployments)
Ensuring failures are as constrained as possible.

In T277711#6926081, @Joe wrote:

As far as mcrouter goes, the only non-brittle solution is to run it inside the pod, so solution 1. The reason is simple: restarting mcrouter and/or it crashing on the node or in a daemonset would make it unavailable for all the pods on the node, without MediaWiki *or* kubernetes noticing.

If we keep it in the pod, we have a few advantages:
we reduce the amount of requests a single mcrouter instance has to deal with. We've seen that during peak traffic latencies of mcrouter can increase as we're stretching its ability to scale as a single instance

I would like to revisit this, given we are running 0.41 now, I could have a quick look if this still stands, because that would indeed be a problem.

Zero-downtime config changes (they become k8s deployments)

Mcrouter has a zero downtime config changes already as it is watching its config file and reloads. I might be on the wrong here, but specifically for config changes, it might take longer in k8s.

Ensuring failures are as constrained as possible.

In general, mcrouter is one of the most stable pieces in our stack, I do not recall mcrouter failing us, apart from a bad config or our beloved TKOs (which are a feature). Unless we run into some odd bug, I have faith it will not fail us. Maybe we can include a check in the php-fpm container readiness probe that a connection to mcrouter is established, so in case of a mcrouter fail, we limit the consequences. I don't know if that would be an anti-pattern, but maybe we should consider it in general.

Additionally, running multiple mcouter instances within a single node, we multiply the number of connections towards the local memcached cluster + the remote one (assuming T271967 is successful). Mcrouter supports connection pooling, so running on the host will reduce the # of connections. This is not necessarily problematic, but worth pointing out.

If we choose to run it on the node, we have a more controllable way to test config changes or version changes (eg roll it to 1-2 nodes and see how it goes, via puppet feature flags), which is my main concern if we go with options 1 or 2.

I don't really like option 3 just because it moves parts of the software stack to the node itself and I would personally like them to be as dumb as possible, ideally just running kubernetes components and docker. This might be a bit opinionated, but IMHO it makes dealing with the nodes more easy as one can be sure that all the actual workload on the node is "visible" via kubenetes API and there is nothing "hidden" that one might need to take care of when dealing with nodes.

JMeybohm added a parent task: T265327: Create a basic helm chart to test MediaWiki on kubernetes.Mar 22 2021, 9:49 AM

In T277711#6927861, @JMeybohm wrote:

I don't really like option 3 just because it moves parts of the software stack to the node itself and I would personally like them to be as dumb as possible, ideally just running kubernetes components and docker. This might be a bit opinionated, but IMHO it makes dealing with the nodes more easy as one can be sure that all the actual workload on the node is "visible" via kubenetes API and there is nothing "hidden" that one might need to take care of when dealing with nodes.

This is understood, but we need to come up with a reasonable way to gradually rollout changes to mcrouter when needed, be that a newer version or a configuration change.

MoritzMuehlenhoff subscribed.Mar 22 2021, 9:59 AM

In T277711#6933756, @jijiki wrote:

In T277711#6927861, @JMeybohm wrote:

I don't really like option 3 just because it moves parts of the software stack to the node itself and I would personally like them to be as dumb as possible, ideally just running kubernetes components and docker. This might be a bit opinionated, but IMHO it makes dealing with the nodes more easy as one can be sure that all the actual workload on the node is "visible" via kubenetes API and there is nothing "hidden" that one might need to take care of when dealing with nodes.

This is understood, but we need to come up with a reasonable way to gradually rollout changes to mcrouter when needed, be that a newer version or a configuration change.

That is already done in the MediaWiki chart.

In T277711#6933792, @Joe wrote:

In T277711#6933756, @jijiki wrote:

In T277711#6927861, @JMeybohm wrote:

I don't really like option 3 just because it moves parts of the software stack to the node itself and I would personally like them to be as dumb as possible, ideally just running kubernetes components and docker. This might be a bit opinionated, but IMHO it makes dealing with the nodes more easy as one can be sure that all the actual workload on the node is "visible" via kubenetes API and there is nothing "hidden" that one might need to take care of when dealing with nodes.

This is understood, but we need to come up with a reasonable way to gradually rollout changes to mcrouter when needed, be that a newer version or a configuration change.

That is already done in the MediaWiki chart.

Looking at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/670220, I am not sure I can understand, it could be due to my poor kubernetes knowledge, can you give me an example of how we will test a mcrouter version/config change on a limited number of pods to ensure it works as expected?

In T277711#6933792, @Joe wrote:

That is already done in the MediaWiki chart.

But that does now deploy mcrouter as a sidecar in each MW pod. AIUI this might come with additional cons like more connections to memcached, less use of pooled connections etc. or did I get that wrong? Should we evaluate that?

In T277711#6933874, @JMeybohm wrote:

In T277711#6933792, @Joe wrote:

That is already done in the MediaWiki chart.

But that does now deploy mcrouter as a sidecar in each MW pod. AIUI this might come with additional cons like more connections to memcached, less use of pooled connections etc. or did I get that wrong? Should we evaluate that?

more connections to the memcached hosts is true, but we can fine-tune that when we need it. Less use of pooled connections is OTOH not true, and not an issue.

In T277711#6933851, @jijiki wrote:

Looking at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/670220, I am not sure I can understand, it could be due to my poor kubernetes knowledge, can you give me an example of how we will test a mcrouter version/config change on a limited number of pods to ensure it works as expected?

by changing the values.yaml file for a canary release, and/or changing the docker image version.

Given it has created some doubts, let me clarify: I've created a first version of the charts that implements solution 1 (and not a complete version of it, either).

I did so not to ignore the discussion ongoing here, I just need a functional version of the chart that we can start working from. And the work I did there (a couple days worth) will be not thrown away unless we pick option 3.

Joe claimed this task.Mar 22 2021, 4:07 PM

RhinosF1 subscribed.Mar 22 2021, 4:10 PM

Trying to break down my current thoughts:

Onhost memcached

In terms of functionality, I don't see a difference between being a DaemonSet and running on the host itself. I would opt for running it inside kubernetes for the reasons outlined by @JMeybohm previously. It's still an open question how we will inject the node IP into the mcrouter configuration. it would mean we'd need to pass the host IP as an env variable to the mcrouter container and somehow inject it into the configuration. The downside is we wouldn't be able to make use of mcrouter's ability to reload its configuration at runtime.

I don't really see alternatives right now, as adding the onhost memcached to the pod would require even more memory we already need (see T278220) and make smaller pods (which we like) impractical.

Mcrouter

I would like to focus on the main question, which is mcrouter should be inside the pod or not. I see the following reasons for answering "yes":

it will be part of the pod, meaning that it will be part of the functional unit of work
it will be serving a smaller amount of requests, making it a performance afterthought

And the reason for the "no" are:

We don't need between 5 and 7 mcrouters running on each node.
We could do without the overhead of 0.5 cores and 300 MB of ram, multiplied by the number of pods per node

I also agree with @jijiki that mcrouter hardly ever crashes. All that considered, I think running it as a daemonset with an hostPort is probably the best solution for us right now. We might also consider changing architecture a bit and running mcrouter on the memcached hosts, as facebook does as far as I understand.

Nutcracker

Nutcracker will be eliminated, one way or the other, but its resource consumption is so small that I don't think keeping it in the pod creates a lot of issues. I would consider it a non-issue

onhost memcached

It's still an open question how we will inject the node IP into the mcrouter configuration. it would mean we'd need to pass the host IP as an env variable to the mcrouter container and somehow inject it into the configuration. The downside is we wouldn't be able to make use of mcrouter's ability to reload its configuration at runtime.

That's a bummer. As you know I have quite limited experience with mcrouter and it's config. Is there maybe a way to interpolate env variables or maybe leverage the downward api to write a (unfortunately not json) file to import into the mcrouter config?
In the past I've worked around a similar limitation by having a "inotify-sidecar" that loads the updates config, interpolates some variables in it and then writes a valid-for-the-service version of the config into a shared emptyDir. That could maybe work as well here.

mcrouter

With mcrouter in the pod it is very clear on how different mcrouter versions or config would be tested, as that will be just another mediawiki release. For mcrouter as daemonset, it is not very clear to me how that would be done.

(sorry for not quoting)

As far as connectivity goes, we can run both mcrouter and onhost memcached on a unix socket, if that is of any help. Generally speaking, we have to decide if we want to solve this soon (next 1-3 quarters), or choose a temporary solution that will take us to at least the first release of mw on k8s. Since part of our plan is to run both clusters (mw-on-k8s+ our current infra) for a period of time:

*option 1: maybe it would make sense to have both mcrouter+memcached running as normal services, since this will help us keep control of this moving part in both installations, the same way.
*option 2: if we will need time to solve the challenges presented running them as a daemonset, we can consider keeping mcrouter+memcached in the pod, and take the resource wasting hit.

When I am talking about "changes", apart from upgrading to a new version, what we will most likely need will be to add/remove hosts from the pool, provided that 1) a couple of ongoing memcached/mcrouter related projects have been completed 2) we have refreshed both memcached clusters.

We can consider either solution as temporary, with the goal to eventually run them both as daemonsets.

TK-999 subscribed.Mar 25 2021, 2:05 PM

akosiaris added a subtask: T278220: Define the size of a pod for mediawiki in terms of resource usage.Mar 26 2021, 1:57 PM

Legoktm mentioned this in T280881: New Service Request Toolhub.Jul 21 2021, 8:52 PM

jijiki moved this task from Incoming 🐫 to 🙈🙉🙊Backlog on the serviceops board.Sep 28 2022, 2:23 PM

jijiki moved this task from 🙈🙉🙊Backlog to 🌻Mediawiki on the serviceops board.Nov 9 2022, 2:16 PM

After collecting some correct data, and discussing the matter with @Krinkle , we don't think we have a strict need for onhost memcached at the moment if not for releiving pressure on the network interfaces of the memcache node; which are now however finally all on 10G NICs, so we can just disable the onhost memcache tier for now.

jijiki renamed this task from Memcached, mcrouter, nutcracker's future in MediaWiki on Kubernetes to Memcached, mcrouter in MediaWiki on Kubernetes.Jan 12 2023, 6:02 PM

jijiki claimed this task.

jijiki updated Other Assignee, added: Joe.

jijiki updated the task description. (Show Details)

jijiki mentioned this in T346690: mcrouter daemonset on mw-on-k8s.Sep 20 2023, 1:03 PM

jijiki added a subtask: T346690: mcrouter daemonset on mw-on-k8s.

Joe closed subtask T278220: Define the size of a pod for mediawiki in terms of resource usage as Resolved.Oct 26 2023, 7:43 AM

jijiki changed the status of subtask T346690: mcrouter daemonset on mw-on-k8s from Open to Stalled.Feb 1 2024, 6:40 PM

jijiki changed the status of subtask T346690: mcrouter daemonset on mw-on-k8s from Stalled to In Progress.Feb 29 2024, 4:24 PM

jijiki edited parent tasks, added: T290536: Serve production traffic via Kubernetes; removed: T265327: Create a basic helm chart to test MediaWiki on kubernetes.Apr 11 2024, 12:39 PM

jijiki changed the status of subtask T346690: mcrouter daemonset on mw-on-k8s from In Progress to Stalled.Wed, May 15, 1:09 PM