Page MenuHomePhabricator

Deployment strategy for the session storage application.
Closed, ResolvedPublic

Description

While the cassandra cluster will be installed on real iron, we want to deploy the corresponding application, kask, on kubernetes, using the deployment pipeline. The reason is we expect we'll set up other copies of this service running on kubernetes. We don't want to have an heterogeneous execution environment, but at the same time we want to compartmentalize relatively security sensitive stuff like user sessions from the rest of the applications (which are less security sensitive and might process arbitrary user input)

Luckily in kubernetes it's easy to add nodes to the cluster with annotations that can then be used during deployments to select which nodes an application will run on. So we can run security-sensitive applications in a different tier than the other ones, on separate servers.

In this specific case, I see two possible solutions:

  • we add the cassandra nodes to kubernetes as "session-service-only" nodes, so that we basically get the application to run on the same servers as where the datastore is
  • we add a more generic "privacy" tier of kubernetes nodes, probably just VMs for now, dedicated to running all applications with access to higher security-sensitive data, and run the service there.

I see advantages in both approaches, but I think the latter would be more flexible and make more sense on the long run.

Event Timeline

Joe triaged this task as Medium priority.Mar 5 2019, 12:26 PM
Joe created this task.
Joe removed Eevans as the assignee of this task.Mar 5 2019, 12:28 PM

Definitely the 2nd as far as I am concerned.

  • we add a more generic "privacy" tier of kubernetes nodes, probably just VMs for now, dedicated to running all applications with access to higher security-sensitive data, and run the service there.

We avoid all kinds of weird interactions between kask, cassandra and kubernetes cluster components running on the same hosts and we are able to scale out, relocate the session storage service independently without caring about colocating with hardware we might not have at the time of need.

For my part, I'm Good with either approach; I really like the idea of not having an exceptional deployment for session storage. However, one of the objectives here was isolation, and while I think I'm satisfied we're still achieving that, the latter of the two proposed strategies would seem to trade-away the total isolation we'd have if everything were on dedicated iron.

TL;DR We should probably make sure everyone else is OK with this compromise as well.

/cc @mobrovac @Clarakosi @Pchelolo @Fjalapeno

From an IRC conversation:

[2019-03-05 09:29:06] <urandom> _joe_: do you think we'll be able to do this by the end of the quarter? Insofar as having it all in place, and capable of being tested?
[2019-03-05 09:32:05] → mvolz joined (~mvolz@cpc69056-oxfd26-2-0-cust328.4-3.cable.virginm.net)
[2019-03-05 09:43:28] <_joe_> urandom: you should ask akosiaris, and I'm 100% on the php7 goal
[2019-03-05 09:43:39] <_joe_> and we just lost fabian for a few weeks
[2019-03-05 09:43:45] <urandom> roger that
[2019-03-05 09:43:46] <_joe_> so we're strapped
[2019-03-05 09:47:02] <akosiaris> urandom: we can try but we are really spread thin right now with all the goals. But we could prioritize for early next quarter

@akosiaris So that we know how to plan, is this canonical; Is this something that we can plan for for early next quarter?

@akosiaris So that we know how to plan, is this canonical; Is this something that we can plan for for early next quarter?

Yes, I think we can. Make sure to send out an email request to managers so that they are aware, but otherwise it makes perfect sense to me.

Change 496768 had a related patch set uploaded (by Eevans; owner: Eevans):
[mediawiki/services/kask@master] (Temporarily) hack config path for deployment-prep testing

https://gerrit.wikimedia.org/r/496768

Change 496768 merged by jenkins-bot:
[mediawiki/services/kask@master] (Temporarily) hack config path for deployment-prep testing

https://gerrit.wikimedia.org/r/496768

Kask has now been setup for session storage in deployment-prep using docker_services (deployment-sessionstore01.deployment-prep.eqiad.wmflabs); I have a few questions about how this all will work in production (and presumably deployment-prep, at some point in the future).

  • The name used in docker_services is a normalization of the Git repo name (i.e. mediawiki-services-kask here), will this also be the case when deployed to k8s in production?
  • The directory containing the config file is a function of the name (see above), will this a) also be the case in k8s, and b) will it be something that can be overridden?
  • What should the convention be for overriding container startup? In other words: Blubber seems to encourage the creation of an ENTRYPOINT, and service::docker provides the means to override CMD, does this reflect the way things will be in production?

Kask has now been setup for session storage in deployment-prep using docker_services (deployment-sessionstore01.deployment-prep.eqiad.wmflabs); I have a few questions about how this all will work in production (and presumably deployment-prep, at some point in the future).

  • The name used in docker_services is a normalization of the Git repo name (i.e. mediawiki-services-kask here), will this also be the case when deployed to k8s in production?

More or less. It will be the name of the image that is used in the helm charts. But not in anything else.

  • The directory containing the config file is a function of the name (see above), will this a) also be the case in k8s, and b) will it be something that can be overridden?

a) It doesn't have to be. In https://github.com/wikimedia/mediawiki-services-kask/blob/master/.pipeline/blubber.yaml#L29 you can see that you set any command you can. I don't like /etc/mediawiki-services-kask btw, wanna push a change to blubber.yaml to make it /etc/kask/config.yaml?

b) Although we prefer to not do it, we can override the entrypoint and can explicitly what we want, so that if the image entrypoint is not working for some reason set our own.

  • What should the convention be for overriding container startup? In other words: Blubber seems to encourage the creation of an ENTRYPOINT, and service::docker provides the means to override CMD, does this reflect the way things will be in production?

As I said above, we can override the entrypoint in production.

Kask has now been setup for session storage in deployment-prep using docker_services (deployment-sessionstore01.deployment-prep.eqiad.wmflabs); I have a few questions about how this all will work in production (and presumably deployment-prep, at some point in the future).

  • The name used in docker_services is a normalization of the Git repo name (i.e. mediawiki-services-kask here), will this also be the case when deployed to k8s in production?

More or less. It will be the name of the image that is used in the helm charts. But not in anything else.

OK, and the image name is a normalization of the Git repo name (I see no way to change this in Blubber).

  • The directory containing the config file is a function of the name (see above), will this a) also be the case in k8s, and b) will it be something that can be overridden?

a) It doesn't have to be. In https://github.com/wikimedia/mediawiki-services-kask/blob/master/.pipeline/blubber.yaml#L29 you can see that you set any command you can. I don't like /etc/mediawiki-services-kask btw, wanna push a change to blubber.yaml to make it /etc/kask/config.yaml?

I don't like it either; This is what prompted these questions! The image entrypoint was /etc/kask/config.yaml, but I didn't see a way to alter where the configuration would be written via docker_services, and an override_cmd won't work so long as the image entrypoint is passing all of the arguments.

b) Although we prefer to not do it, we can override the entrypoint and can explicitly what we want, so that if the image entrypoint is not working for some reason set our own.

  • What should the convention be for overriding container startup? In other words: Blubber seems to encourage the creation of an ENTRYPOINT, and service::docker provides the means to override CMD, does this reflect the way things will be in production?

As I said above, we can override the entrypoint in production.

I don't like it either; This is what prompted these questions! The image entrypoint was /etc/kask/config.yaml, but I didn't see a way to alter where the configuration would be written via docker_services, and an override_cmd won't work so long as the image entrypoint is passing all of the arguments.

Ah yes. $title is hardcoded in both the config file and the image name in the puppet manifests. I guess we can decouple them.