Convert pool from a few large slaves (4X) to more smaller slaves (1X)
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	Krinkle
	Apr 20 2015, 11:21 PM

Description

This is a proposal with the following goals:

Enable a reliable git cache on our integration slaves. (Blocks T93703, and T96627)
Increase test execution performance.
Increase stability by reducing ways that builds can interact with each other.

There are many race conditions and potential transient issues between active workspaces and a local git cache. By decreasing our executor slots from four concurrent workers to one, we can easily have our git cache updated by a scheduled Jenkins job that will naturally only occur between builds.

In order to decrease our executor slots and keep our current build throughout, we'll need to increase the number of slaves quite a bit. Something that will happen anyway when we start using Nodepool. We can also reduce our instance size as we wouldn't need as much CPU/RAM/Disk capacity.

Figuring our the right instance size and number of slaves is a good preparation for our Nodepool setup

In addition this would make our environment more isolated. And it is what OpenStack has been doing for a while (before they switched to Nodepool).

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		Krinkle	T91211 gallium and lanthanum disks full (tracking)
Declined		None	T91707 L10n-bot should not force-merge / override Jenkins (breaks the build)
Declined		hashar	T93703 reduce copies of mediawiki/core in workspaces
Resolved	PRODUCTION ERROR	Krinkle	T86730 Zuul-cloner failing to acquire .git lock sometimes
Declined		hashar	T96627 Jenkins jobs must wipe workspace
Declined		hashar	T96629 Convert pool from a few large slaves (4X) to more smaller slaves (1X)
Resolved		Andrew	T96706 Create an instance image like m1.small with 2 CPUs and 30GB space

Event Timeline

Krinkle created this task.Apr 20 2015, 11:21 PM

Krinkle raised the priority of this task from to Medium.

Krinkle updated the task description. (Show Details)

Krinkle added a project: Continuous-Integration-Infrastructure.

Krinkle added subscribers: dduvall, Aklapper, greg and 2 others.

Krinkle renamed this task from Convert pool from 10 large slaves (4X) to 40 smaller slaves (1X) to Convert pool from a few large slaves (4X) to more smaller slaves (1X).Apr 20 2015, 11:23 PM

Krinkle updated the task description. (Show Details)

Krinkle set Security to None.

Is this worth doing while we're doing Continuous-Integration-Scaling ?

The Jenkins scheduler tends to run jobs on instance that last run the job. Our current instances have several executors and we often end up with an instance running all the jobs will the others are idling.

With the switch to MySQL, and having up to 5 process running MediaWiki tests, we have a very noticeable slowdown. Having 1 executor instances will mitigate that offering more I/O and eliminating CPU contentions between jobs. With the ci isolation, that is what we aim for.

Timo wrote:

Enable a reliable git cache on our integration slaves.

That is definitely needed. I thought we had a task for it but haven't been able to find one. Note zuul-cloner is hardcoded to do a full copy of the locally available git mirror.

For the instance size. Wmflabs offers m1.small flavor which has 1 CPU and 2GB RAM though mysql / php will share the same core. Maybe 2 CPU instead? That will probably have an impact on wmflabs provisioning / resources availability.

Krinkle mentioned this in T96687: Set up git replication on integration slaves.Apr 21 2015, 2:40 PM

Krinkle moved this task from Untriaged to Backlog on the Continuous-Integration-Infrastructure board.Apr 21 2015, 2:56 PM

Andrew closed subtask T96706: Create an instance image like m1.small with 2 CPUs and 30GB space as Resolved.Apr 21 2015, 10:09 PM

Krinkle reopened subtask T96706: Create an instance image like m1.small with 2 CPUs and 30GB space as Open.Apr 23 2015, 4:55 AM

hashar closed subtask T96706: Create an instance image like m1.small with 2 CPUs and 30GB space as Resolved.Apr 30 2015, 8:24 PM