As we're moving more services through the Deployment Pipeline to production, beta is beginning to suffer.
There are several proposed solutions; Let's see if using an existing hosted k8s solution is viable.
Problems
- The Deployment Pipeline is currently unable to perform system tests that incorporate both a change to a service and an existing MediaWiki installation; It is limited to e2e testing only the service itself.
- A k8s cluster that integrates with Beta Cluster (as in has secure network ingress/egress between deployed pods/services and existing deployment-prep instances) would allow the Deployment Pipeline to perform this kind of testing. However, at this time neither SRE nor RelEng can commit to maintaining an in-house k8s cluster for this purpose.
Proposal
Experiment with [third party k8s provider] to evaluate its potential as a third-party hosted k8s cluster that can:
- Provide a k8s cluster that the Deployment Pipeline can target as part of its graduated deployment/testing strategy.
- Securely integrate with Beta Cluster at a network level.
- Run e2e helm tests that exercise service changes and existing MediaWiki deployments in Beta Cluster together.
Evaluation
A very basic test for teasing out whether any third party k8s is viable could be:
- Can our existing Mathoid helm chart be used to deploy there?
- If not, how much refactoring would the chart(s) need? More precisely, can we make them work with both [third party k8s] and WMF k8s without too much divergence?