Project Name: zuul
Developer account usernames of requestors:
- BryanDavis / bd808
- Hashar / hashar
- Dduvall / dduvall
- Thcipriani / thcipriani
Purpose: Hosting a Kubernetes cluster that will be used by the next-generation Gerrit + Zuul CI system currently under development by the Release Engineering team. This will eventually replace the current integration project which hosts Jenkins exec nodes for the current Gerrit + Zuul + Jenkins CI system.
Brief description: This project will become the main job running pool for the next generation CI system used by Gerrit projects (MediaWiki core, skins, and extensions; operations/puppet; etc). Zuul command and control servers hosted in the eqiad and codfw Ganetti clusters will dispatch jobs to the zuul-runners project via Kubernetes APIs.
We will be initially setting up a Magnum managed Kubernetes cluster using OpenTofu automation similar to https://gitlab.wikimedia.org/cloudvps-repos/deployment-prep/tofu-provisioning. We need to perform some testing to find out if a Magnum k8s cluster can handle the etcd stress that will be applied by the Zuul job management process. Each job will entail creating a dedicated k8s namespace, running a Pod, exporting static assets (job logs, test reports, etc), and finally deleting the namespace.
We will need increased compute and storage to fully implement the project. Compute needs are expected to be roughly equivalent to the current integration project. Long term storage needs likely will also be similar to the utilization in the integration project but may make use of object storage for historic job assets rather than volumes.
We can use quota requests to adjust things as the build-out and testing progresses, but it would be nice to start with room to build a Magnum cluster with 1 master and 4 nodes plus a project-local puppetserver and a small bastion server. I think this would mean:
- 1 g4.cores2.ram4.disk20 Kubernetes master
- 4 g4.cores8.ram32.disk20 Kubernetes nodes
- 1 g4.cores8.ram32.disk20 Puppetserver
- 1 g4.cores4.ram8.disk20 Bastion
| instances | 7 |
|---|---|
| cores | 46 |
| ram | 172G |
A full build out will probably want something like 24 more Kubernetes nodes (28 total) and a second master. The instances we use as Jenkins exec nodes in the integration project use a custom flavor with 4x IOPS and a 90G ephemeral storage addition. We need to do some testing to figure out what the replacement needs are in a Kubernetes environment. Chances are pretty good we will be coming back to talk about at least the IOPS bump and probably something to replace that ephemeral storage.
How soon you are hoping this can be fulfilled: "as soon as possible"