Page MenuHomePhabricator

Experiment with hosted kubernetes solutions for Beta
Open, NormalPublic

Description

As we're moving more services through the Deployment Pipeline to production, beta is beginning to suffer.

There are several proposed solutions; Let's see if using an existing hosted k8s solution is viable.

Problems

  1. The Deployment Pipeline is currently unable to perform system tests that incorporate both a change to a service and an existing MediaWiki installation; It is limited to e2e testing only the service itself.
  2. A k8s cluster that integrates with Beta Cluster (as in has secure network ingress/egress between deployed pods/services and existing deployment-prep instances) would allow the Deployment Pipeline to perform this kind of testing. However, at this time neither SRE nor RelEng can commit to maintaining an in-house k8s cluster for this purpose.

Proposal

Experiment with [third party k8s provider] to evaluate its potential as a third-party hosted k8s cluster that can:

  1. Provide a k8s cluster that the Deployment Pipeline can target as part of its graduated deployment/testing strategy.
  2. Securely integrate with Beta Cluster at a network level.
  3. Run e2e helm tests that exercise service changes and existing MediaWiki deployments in Beta Cluster together.

Evaluation

A very basic test for teasing out whether any third party k8s is viable could be:

  1. Can our existing Mathoid helm chart be used to deploy there?
  2. If not, how much refactoring would the chart(s) need? More precisely, can we make them work with both [third party k8s] and WMF k8s without too much divergence?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 8 2019, 4:50 PM
thcipriani assigned this task to dduvall.
thcipriani triaged this task as Normal priority.

assigning to @dduvall based on hangout discussion

Krenair added a subscriber: Krenair.May 8 2019, 4:56 PM
dduvall updated the task description. (Show Details)May 8 2019, 5:41 PM
dduvall updated the task description. (Show Details)May 8 2019, 5:48 PM
dduvall updated the task description. (Show Details)May 8 2019, 6:01 PM
dduvall updated the task description. (Show Details)
jeena added a subscriber: jeena.May 8 2019, 6:05 PM
dduvall removed dduvall as the assignee of this task.Fri, Jun 7, 11:55 PM

I was able to run Mathoid just fine on GKE using the latest chart. What remains of this experiment, however, is getting Beta Cluster's MediaWiki talking to a service deployed to GKE (or Amazon EKS if that makes sense for us policy/budget wise) and vice versa; Basically the networking part.

A couple of options:

  1. VPN between the deployment-prep labs project and Google GKE using ipsec and Google VPC. This might involve more investment to set up initially, but once (if) it's working Beta instances should be able to freely communicate with anything deployed on GKE. There may be a DNS component as well.
  2. Ingress for both ends and public communicate. This might be easier to set up initially but requires additional ingress configuration for each service. MediaWiki/service communication shouldn't include anything sensitive, but maybe?

Unassigning while on leave. Anyone should feel free to pick this up.