Maniphest T272092

Request volume for Docker images and container filesystems on releases machines
Closed, ResolvedPublic
Actions

Description

In order to facilitate the building and running of MediaWiki images and containers on releases1002 and releases2002, we'll want an additional volume dedicated to Docker's data store (/var/lib/docker by default though this can be changed with profile::docker::settings in puppet/hiera). Otherwise, we risk filling or needing to resize / down the road.

Related Objects
Search...

Status	Assigned	Task
Open	None	T198901 Migrate production services to kubernetes using the pipeline
Open	None	T238770 Deploy MediaWiki to Wikimedia production in containers
Open	None	T238771 Get production MW-land images built and published
Resolved	• dduvall	T271472 Determine whether PipelineLib can run on releases-jenkins.wikimedia.org
Resolved	• dduvall	T271477 Define a PipelineLib based MW image build job on releases-jenkins.wikimedia.org
Resolved	• dduvall	T272092 Request volume for Docker images and container filesystems on releases machines

Event Timeline

• dduvall created this task.Jan 14 2021, 9:28 PM

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptJan 14 2021, 9:28 PM

• dduvall triaged this task as Medium priority.Jan 14 2021, 9:29 PM

How many Gigabytes do you need?

In T272092#6749476, @Dzahn wrote:

How many Gigabytes do you need?

Hmm... let's see.

This is going to be primarily for building MW images, and we're looking at (worst case scenario where there is no efficient layer caching) around 1.2G per MW version with two versions for each deployable image (plus a little for config). That's 2.4G per image which could potentially be generated per commit on the wmf/* branches. Let's make it 2.5G per image w/ config.

Here are the number of branch merges for past wmf/1.36.* branches:

$ gerrit query 'branch:^wmf/1\.36.* is:merged' --format=json | jq -s --raw-output 'group_by(.branch)[] | "\(.[0].branch)\t\(length)"'
wmf/1.36.0-wmf.1	13
wmf/1.36.0-wmf.10	22
wmf/1.36.0-wmf.11	13
wmf/1.36.0-wmf.12	3
wmf/1.36.0-wmf.13	18
wmf/1.36.0-wmf.14	18
wmf/1.36.0-wmf.16	27
wmf/1.36.0-wmf.18	12
wmf/1.36.0-wmf.2	11
wmf/1.36.0-wmf.20	12
wmf/1.36.0-wmf.21	14
wmf/1.36.0-wmf.22	15
wmf/1.36.0-wmf.25	8
wmf/1.36.0-wmf.26	6
wmf/1.36.0-wmf.3	13
wmf/1.36.0-wmf.4	8
wmf/1.36.0-wmf.5	8
wmf/1.36.0-wmf.6	6
wmf/1.36.0-wmf.8	11
wmf/1.36.0-wmf.9	17

The 90th percentile—which might be a good figure for capacity estimation—is 18, and we'll want to keep image/layer caches around for at least two weeks to match the train cadence and possibility for quickly building images for rollback (so previous week/branch).

So 2.5G * (18 * 2) = 90G + [nebulous amount of space needed for running container filesystems]... maybe 150G is sufficient? This is a bit hand wavy. Sorry. :)

In T272092#6749690, @dduvall wrote:

In T272092#6749476, @Dzahn wrote:

How many Gigabytes do you need?

So 2.5G * (18 * 2) = 90G + [nebulous amount of space needed for running container filesystems]... maybe 150G is sufficient? This is a bit hand wavy. Sorry. :)

@Dzahn, is that an accurate enough figure?

+1 on the estimate.

Yes, it is. Thank you, I just didn't get to it. Let me take the ticket so I don't forget. I'll try to add it tomorrow and assign it back.

Mentioned in SAL (#wikimedia-operations) [2021-01-20T18:22:10Z] <mutante> ganeti - creating 105G virtual harddisk and adding to releases1002 for T272092

Mentioned in SAL (#wikimedia-operations) [2021-01-20T18:24:34Z] <mutante> ganeti - creating 150G virtual hard disk and adding it to releases2002 for T272092

New disks have been created as above. Now we need to restart the VMs and mount them (manually, unless it's worth puppetizing because this is becoming the new default setup and there will be new releases* machines in the future).

In T272092#6763457, @Dzahn wrote:

New disks have been created as above. Now we need to restart the VMs and mount them (manually, unless it's worth puppetizing because this is becoming the new default setup and there will be new releases* machines in the future).

Puppetizing sounds good. Does it need an LVM group or is that redundant in the case of ganeti?

unfortunately T272555 happened and currently releases2002 is down

akosiaris mentioned this in T272555: releases2002 ganeti VM not getting IP after reboot.Jan 22 2021, 11:59 AM

Thanks for working on this, @Dzahn and @akosiaris! Should I add some puppet to ensure /dev/vdb1 is mounted at /var/lib/docker?

RhinosF1 subscribed.Jan 22 2021, 6:04 PM

Thank to Alex for fixing the subtask!

I rebooted releases1002 as well and .. it had the exact same issue but now I knew the fix and applied it (ens5->ens6 in /etc/network/interfaces).

Then created new partitions with fdisk, created ext4 file system, mounted on /srv/docker.

Finally edited /etc/fstab to make sure it survives reboots and confirmed by rebooting releases2002 one more time. Both networking issue gone and disk got auto-mounted.

Calling it resolved. No, you don't need to worry about puppet, /etc/fstab isn't managed by it normally and this was a one-time action and next time we replace these VMs we will just make bigger disks when creating them.

If this works for you we can call it resolved.

/dev/vdb1       147G   61M  140G   1% /srv/docker

same on both machines

Awesome. Thanks again!

Dzahn mentioned this in T208529: Install docker on releases-jenkins.Jan 22 2021, 6:56 PM

Request volume for Docker images and container filesystems on releases machinesClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Request volume for Docker images and container filesystems on releases machines
Closed, ResolvedPublic
Actions

Related Objects
Search...