The CI isolation project aims at running tests in isolated machines. It is reusing the wmflabs OpenStack system to spawn a pool of VMs. They are then consumed as Jenkins jobs are triggered.
The architecture proposed ( overview on wiki ) comes up with two services in labs subnet each on their own hardware:
nodepool:
We will have a pool manager placed in the labs subnet and interacting with the OpenStack API to create images / spawn instances. We might later on move the Zuul scheduler server from gallium.wikimedia.org to that machine as well.
The server also be responsible for bootstrapping images. Thus it will have slight CPU/IO spikes while generating them and a network spike when pushing the resulting image to labs OpenStack.
The nodepool will needs connections to production machines ( [[ https://www.mediawiki.org/wiki/Continuous_integration/Architecture/Isolation#Security_matrix | security matrix ). Namely: Zeromq/https to gallium.wikimedia.org and mysql to one of the db10xx server and statsd UDP paquets.
zuul mergers:
A second machine will be in charge of preparing the code that will be tested. It takes the patches proposed in Gerrit and merge them on tip of the branch. The result is then retrieved by the jobs over a git-daemon.
Zuul merger has a noticeable network delay (git remote update takes several seconds) when updating the repos, so we will have two zuul-merger instances running in parallel. Since git is heavily file based, each instance will act on its own SSD. No need for raid, in case of hardware failure the data will be repopulated from Gerrit and we can run with a single instance.
The zuul-mergers will each establish a Gearman connection to gallium.wikimedia.org ( [[ https://www.mediawiki.org/wiki/Continuous_integration/Architecture/Isolation#Security_matrix | security matrix ) and statsd UDP paquets.
Labs Project Tested: NodePool hasn't been tested yet. Needs access to an OpenStack API. zuul-merger has been in prod for a while on gallium.wikimedia.org
Site/Location: eqiad in lab subnet
Number of systems: 2
Service: Continuous Integration
Internal/External IP Address: internal IP in labs subnet
VLAN: _____
Nodepool
Upstream has a 8GB instance monitored via cacti: overview, CPU usage, Memory usage
Processor Requirements: 4 cores
Memory: 4GB
Disks: a few GB not much is needed.
NIC(s): 1
Partitioning Scheme: LVM. A partition for /var
Other Requirements:
Zuul merger
It is merely doing git merges which are potentially disk I/O intensive and suffer from disk and network latency.
Processor Requirements: 2 cores, git operations are not that much CPU intensive
Memory: 2GB
Disks: for the zuul-merger two 32+GB SSD. Actual consumption is just 7GB!
NIC(s): 1
Partitioning Scheme: LVM. Each SSD as one big partition mounted under /srv/ssd1 and /srv/ssd2. No RAID needed for the SSD.
Other Requirements: