We started the Nodepool project with 10 Jessie instances with an upper quota of 20is currently limited to 12 instances. I would like to bump the quota for a few reasons:get it raised to 20 instances.
* we have migrated almost all npm jobs
* added more ruby jobs
* in the process of migrating Zend 5.5 and HHVM jobs
* soonish we will migrate browser tests jobs which are long running jobs and occupy an instance for quite a whileThat will let us migrate the Zend 5.5 / HHVM jobs that are currently running on Ubuntu Trusty. An example load is F4708299 ([[ https://integration.wikimedia.org/ci/label/UbuntuTrusty/load-statistics | live link ]]), which seems to indicate that 5 instances will cover it.
Adding a couple more to help with the contention we have observed during peak hours (SF morning / Europe evening) and reach a round number of 20 instances.
I would like to have the upper limited doubled with a base pool of 20 instances (10 Jessie, 10 Trusty) and allowing up to 40 instances. We have already deleted 9 m1.large instances from the pool of permanent slaves (T148183) and will be able to delete a couple more once the HHVM/PHP jobs are moved.
We spawn `m1.medium` which are:have:
| RAM | 4GB
| VCPU | 2
| Disk | 40GB
The Nodepool limit (`max-server`) would be bumped from 12 to 20. On OpenStack side, the quota of instances has to take in account the automatic refresh of snapshot images or two more instances.
| Metric | Base | Max | Future | Future Max | MetricCurrent | New
|---|---|---|---|---|--
| Instances | 10 | 20 | 20 | **40** | Instances| Nodepool `max-server` | 12 | 20
| RAM | 40G | 80G | 80G | **160G** | RAM| OpenStack quota | |
|--
| VCPU | 20 | 40 | 40 | **80** | VCPU| Instances | 15 | 22 |
| Disk | 400G | 800G | 800G | **1.6TB** | Disk
https://grafana.wikimedia.org/dashboard/db/labs-capacity-planning shows there is a bunch of place.RAM | 100G | 100G |
| VCPU | 40 | 44 |
There might be concern with disk space consumption. Though as I understand it disk space is copy on write and not filled until the instance fill the disk.
Looking at the instances:
**Trusty**
| Filesystem | Size | Used | Avail | Use% | Mounted on
|--|--|--|--|--|--
| /dev/vda1 | 38G | 1.82.4G | 354G | 57% | /
**Jessie**
| Filesystem | Size | Used | Avail | Use% | Mounted on
|--|--|--|--|--|--
| /dev/vda1 | 38G | 2.13.6G | 343G | 611% | /
So for 80 instances that would be ~160GBytes disk consumed + whatever the jobs are writing to disk.
The original quota are described in 31f12ddd1386bcf236508c65d2e269ec7238456d