Page MenuHomePhabricator

Convert pool from a few large slaves (4X) to more smaller slaves (1X)
Closed, DeclinedPublic

Description

This is a proposal with the following goals:

  • Enable a reliable git cache on our integration slaves. (Blocks T93703, and T96627)
  • Increase test execution performance.
  • Increase stability by reducing ways that builds can interact with each other.

There are many race conditions and potential transient issues between active workspaces and a local git cache. By decreasing our executor slots from four concurrent workers to one, we can easily have our git cache updated by a scheduled Jenkins job that will naturally only occur between builds.

In order to decrease our executor slots and keep our current build throughout, we'll need to increase the number of slaves quite a bit. Something that will happen anyway when we start using Nodepool. We can also reduce our instance size as we wouldn't need as much CPU/RAM/Disk capacity.

Figuring our the right instance size and number of slaves is a good preparation for our Nodepool setup

In addition this would make our environment more isolated. And it is what OpenStack has been doing for a while (before they switched to Nodepool).

Event Timeline

Krinkle raised the priority of this task from to Medium.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: dduvall, Aklapper, greg and 2 others.
Krinkle renamed this task from Convert pool from 10 large slaves (4X) to 40 smaller slaves (1X) to Convert pool from a few large slaves (4X) to more smaller slaves (1X).Apr 20 2015, 11:23 PM
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.

The Jenkins scheduler tends to run jobs on instance that last run the job. Our current instances have several executors and we often end up with an instance running all the jobs will the others are idling.

With the switch to MySQL, and having up to 5 process running MediaWiki tests, we have a very noticeable slowdown. Having 1 executor instances will mitigate that offering more I/O and eliminating CPU contentions between jobs. With the ci isolation, that is what we aim for.

Timo wrote:

Enable a reliable git cache on our integration slaves.

That is definitely needed. I thought we had a task for it but haven't been able to find one. Note zuul-cloner is hardcoded to do a full copy of the locally available git mirror.

For the instance size. Wmflabs offers m1.small flavor which has 1 CPU and 2GB RAM though mysql / php will share the same core. Maybe 2 CPU instead? That will probably have an impact on wmflabs provisioning / resources availability.

hashar claimed this task.

Will be done as part of migrating the jobs to Nodepool instances.