Page MenuHomePhabricator

Request creation of zuul VPS project
Closed, ResolvedPublic

Description

Project Name: zuul

Developer account usernames of requestors:

  • BryanDavis / bd808
  • Hashar / hashar
  • Dduvall / dduvall
  • Thcipriani / thcipriani

Purpose: Hosting a Kubernetes cluster that will be used by the next-generation Gerrit + Zuul CI system currently under development by the Release Engineering team. This will eventually replace the current integration project which hosts Jenkins exec nodes for the current Gerrit + Zuul + Jenkins CI system.

Brief description: This project will become the main job running pool for the next generation CI system used by Gerrit projects (MediaWiki core, skins, and extensions; operations/puppet; etc). Zuul command and control servers hosted in the eqiad and codfw Ganetti clusters will dispatch jobs to the zuul-runners project via Kubernetes APIs.

We will be initially setting up a Magnum managed Kubernetes cluster using OpenTofu automation similar to https://gitlab.wikimedia.org/cloudvps-repos/deployment-prep/tofu-provisioning. We need to perform some testing to find out if a Magnum k8s cluster can handle the etcd stress that will be applied by the Zuul job management process. Each job will entail creating a dedicated k8s namespace, running a Pod, exporting static assets (job logs, test reports, etc), and finally deleting the namespace.

We will need increased compute and storage to fully implement the project. Compute needs are expected to be roughly equivalent to the current integration project. Long term storage needs likely will also be similar to the utilization in the integration project but may make use of object storage for historic job assets rather than volumes.

We can use quota requests to adjust things as the build-out and testing progresses, but it would be nice to start with room to build a Magnum cluster with 1 master and 4 nodes plus a project-local puppetserver and a small bastion server. I think this would mean:

  • 1 g4.cores2.ram4.disk20 Kubernetes master
  • 4 g4.cores8.ram32.disk20 Kubernetes nodes
  • 1 g4.cores8.ram32.disk20 Puppetserver
  • 1 g4.cores4.ram8.disk20 Bastion
instances7
cores46
ram172G

A full build out will probably want something like 24 more Kubernetes nodes (28 total) and a second master. The instances we use as Jenkins exec nodes in the integration project use a custom flavor with 4x IOPS and a 90G ephemeral storage addition. We need to do some testing to figure out what the replacement needs are in a Kubernetes environment. Chances are pretty good we will be coming back to talk about at least the IOPS bump and probably something to replace that ephemeral storage.

How soon you are hoping this can be fulfilled: "as soon as possible"

NOTE: There is an existing zuul3 Cloud VPS project. That project is intended to be temporary and will likely be shutdown within the next 2-3 months at most. Naming is hard as we all know.

Event Timeline

fnegri changed the task status from Open to In Progress.Jun 11 2025, 9:21 AM
fnegri claimed this task.
fnegri moved this task from Inbox to Approved on the Cloud-VPS (Project-requests) board.

runners comes from GitLab semantic: https://docs.gitlab.com/runner/ . Zuul refers to test resources as nodes and I'd like to avoid the confusion between the two systems.

I would like to change the requested project name from zuul-runners to zuul-nodes :)

I will wait for @bd808 to +1 or -1 the name change. :)

bd808 changed the task status from In Progress to Stalled.Jun 11 2025, 3:03 PM

I will wait for @bd808 to +1 or -1 the name change. :)

We will resume the prior discussion in T396247: Set up new project for Zuulv3+ pre-merge and non-image-build workloads to decide what color to paint the bikeshed.

thcipriani changed the task status from Stalled to Open.Jun 11 2025, 5:25 PM
thcipriani subscribed.

I will wait for @bd808 to +1 or -1 the name change. :)

We will resume the prior discussion in T396247: Set up new project for Zuulv3+ pre-merge and non-image-build workloads to decide what color to paint the bikeshed.

From that task:

Talked about this today in the Release-Engineering-Team team meeting, we landed on zuul as the compromise. Acknowledging that there is a zuul3 project, but that project will go away beyond the proof-of-concept phase.

I note, @bd808 also attended that meeting.

I'll update the task description, too.

thcipriani renamed this task from Request creation of zuul-runners VPS project to Request creation of zuul VPS project.Jun 11 2025, 5:26 PM
thcipriani updated the task description. (Show Details)

Project created! Please double check that the permissions and quotas are as expected.

As a follow up, I have made a proof of concept for Zuul and object storage (in the zuul3 tenant rather than this new zuul project). When reading the documentation at https://wikitech.wikimedia.org/wiki/Help:Object_storage_user_guide I found:

Object storage will not work at all for projects with a - in their name due to limitations in the software. Feel free to open a project request for a new project with a simpler name, either to replace your project or to use alongside your current project for object storage.

Thus I am quite happy we went with simply zuul :]

Hah! That's kind of funny. Yea, good that it's just zuul then.

As a follow up, I have made a proof of concept for Zuul and object storage (in the zuul3 tenant rather than this new zuul project). When reading the documentation at https://wikitech.wikimedia.org/wiki/Help:Object_storage_user_guide I found:

Object storage will not work at all for projects with a - in their name due to limitations in the software. Feel free to open a project request for a new project with a simpler name, either to replace your project or to use alongside your current project for object storage.

Thus I am quite happy we went with simply zuul :]

Legacy Cloud VPS projects with a - in the name have a problem using the S3 storage gateway, but new projects have a UUID as the project id which eliminates that problem.

The docs you found are out of date.

As a follow up, I have made a proof of concept for Zuul and object storage (in the zuul3 tenant rather than this new zuul project). When reading the documentation at https://wikitech.wikimedia.org/wiki/Help:Object_storage_user_guide I found:

Object storage will not work at all for projects with a - in their name due to limitations in the software. Feel free to open a project request for a new project with a simpler name, either to replace your project or to use alongside your current project for object storage.

Thus I am quite happy we went with simply zuul :]

Legacy Cloud VPS projects with a - in the name have a problem using the S3 storage gateway, but new projects have a UUID as the project id which eliminates that problem.

The docs you found are out of date.

Hah, I just went to that page to change 'in the name' to 'in the internal project ID' but Taavi beat me to it. Newly-created projects now use uuids for their internal ID so the '-' thing is only a problem for vintage project names, not for new ones. It's kind of a mess!