Page MenuHomePhabricator

Large quota increase for zuul Cloud VPS project
Closed, ResolvedPublic

Description

Project Name: zuul
Type of quota increase requested: instances, cores, ram, volumes, and volume space
Amount to increase:

  • instances: 44 (52 total)
  • cores: 320 (356 total)
  • ram: 1180GB (1352GB total)
  • volumes: 42 (50 total)
  • volume space: 94GB (494GB total)

Reason: Follow up to T396540: Request creation of zuul VPS project and T397098: Increase volume storage quota for zuul project

The basic gitops automation for this new shared testing runtime environment is working. We are ready to scale up the cluster so we can perform load/performance testing. To this end I have sketched out our initial full cluster plan based loosely on the existing https://openstack-browser.toolforge.org/project/integration project that will eventually be replaced by this zuul project.

My current estimate for our complete buildout (subject to load/performance testing validation):

  • 1 puppetserver [g4.cores2.ram4.disk20]
  • 1 bastion [g4.cores2.ram4.disk20]
  • 1 haproxy [g4.cores2.ram4.disk20]
  • 1 kubernetes cluster:
    • 3 master nodes (odd number for etcd replication) [g4.cores2.ram4.disk20]
    • 40 worker nodes [g4.cores8.ram32.disk20]

This buildout needs a quota of:

  • instances: 1+1+1+3+40 = 46
  • cores: 2+2+2+(3*2)+(40*8) = 332
  • ram: 4+4+4+(3*4)+(40*32) = 1304GB
  • volumes: 1 (puppet) + 3 (k8s masters) + 40 (k8s workers) = 44
  • volume space: 4+(3*10)+(40*10) = 434GB

I also want a bit of headroom for additional testing within the project once that full buildout is complete:

  • instances: 6
  • cores: 6*4 = 24
  • ram: 6*8 = 48 GB
  • volumes: 6
  • volume space: 6*10 = 60 GB

Combined this needs:

  • instances: 46 + 6 = 52
  • cores: 332 + 24 = 356
  • ram: 1304 + 48 = 1352GB
  • volumes: 44 + 6 = 50
  • volume space: 434 + 60 = 494GB

The project has a current quota of:

  • instances: 8
  • cores: 36
  • ram: 172GB
  • volumes: 8
  • volume space: 400GB

Subtracting the current quota from the desired combined quota gives the values listed earlier in this ticket.

Event Timeline

Is the plan to gradually unprovision resources from the integration project and spin up equivalent resources in zuul, or would both projects be fully provisioned for some overlap period?

Ceph looks to have plenty of space so this is a +1 from me.

Is the plan to gradually unprovision resources from the integration project and spin up equivalent resources in zuul, or would both projects be fully provisioned for some overlap period?

I have not seen a full plan for switchover, but it would seem safest to plan for parallel operation. If nothing else parallel operation would provide us with the best load test if we can do a non-voting parallel execution for all tests. If there are resource constraints that make that stressful for Cloud VPS we can certainly adjust plans to fit the available compute and storage.

Mentioned in SAL (#wikimedia-cloud-feed) [2025-07-29T16:12:34Z] <wmbot~dcaro@acme> START - Cookbook wmcs.openstack.quota_increase (T400305)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-07-29T16:12:43Z] <wmbot~dcaro@acme> END (FAIL) - Cookbook wmcs.openstack.quota_increase (exit_code=99) (T400305)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-07-29T16:12:57Z] <wmbot~dcaro@acme> START - Cookbook wmcs.openstack.quota_increase (T400305)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-07-29T16:13:04Z] <wmbot~dcaro@acme> END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T400305)

dcaro triaged this task as High priority.

I think I got all the quotas :)

image.png (860×2 px, 155 KB)

Enjoy!