Page MenuHomePhabricator

Request volume for Docker images and container filesystems on releases machines
Closed, ResolvedPublic

Description

In order to facilitate the building and running of MediaWiki images and containers on releases1002 and releases2002, we'll want an additional volume dedicated to Docker's data store (/var/lib/docker by default though this can be changed with profile::docker::settings in puppet/hiera). Otherwise, we risk filling or needing to resize / down the road.

Event Timeline

How many Gigabytes do you need?

Hmm... let's see.

This is going to be primarily for building MW images, and we're looking at (worst case scenario where there is no efficient layer caching) around 1.2G per MW version with two versions for each deployable image (plus a little for config). That's 2.4G per image which could potentially be generated per commit on the wmf/* branches. Let's make it 2.5G per image w/ config.

Here are the number of branch merges for past wmf/1.36.* branches:

$ gerrit query 'branch:^wmf/1\.36.* is:merged' --format=json | jq -s --raw-output 'group_by(.branch)[] | "\(.[0].branch)\t\(length)"'
wmf/1.36.0-wmf.1	13
wmf/1.36.0-wmf.10	22
wmf/1.36.0-wmf.11	13
wmf/1.36.0-wmf.12	3
wmf/1.36.0-wmf.13	18
wmf/1.36.0-wmf.14	18
wmf/1.36.0-wmf.16	27
wmf/1.36.0-wmf.18	12
wmf/1.36.0-wmf.2	11
wmf/1.36.0-wmf.20	12
wmf/1.36.0-wmf.21	14
wmf/1.36.0-wmf.22	15
wmf/1.36.0-wmf.25	8
wmf/1.36.0-wmf.26	6
wmf/1.36.0-wmf.3	13
wmf/1.36.0-wmf.4	8
wmf/1.36.0-wmf.5	8
wmf/1.36.0-wmf.6	6
wmf/1.36.0-wmf.8	11
wmf/1.36.0-wmf.9	17

The 90th percentile—which might be a good figure for capacity estimation—is 18, and we'll want to keep image/layer caches around for at least two weeks to match the train cadence and possibility for quickly building images for rollback (so previous week/branch).

So 2.5G * (18 * 2) = 90G + [nebulous amount of space needed for running container filesystems]... maybe 150G is sufficient? This is a bit hand wavy. Sorry. :)

How many Gigabytes do you need?

So 2.5G * (18 * 2) = 90G + [nebulous amount of space needed for running container filesystems]... maybe 150G is sufficient? This is a bit hand wavy. Sorry. :)

@Dzahn, is that an accurate enough figure?

Yes, it is. Thank you, I just didn't get to it. Let me take the ticket so I don't forget. I'll try to add it tomorrow and assign it back.

Mentioned in SAL (#wikimedia-operations) [2021-01-20T18:22:10Z] <mutante> ganeti - creating 105G virtual harddisk and adding to releases1002 for T272092

Mentioned in SAL (#wikimedia-operations) [2021-01-20T18:24:34Z] <mutante> ganeti - creating 150G virtual hard disk and adding it to releases2002 for T272092

New disks have been created as above. Now we need to restart the VMs and mount them (manually, unless it's worth puppetizing because this is becoming the new default setup and there will be new releases* machines in the future).

New disks have been created as above. Now we need to restart the VMs and mount them (manually, unless it's worth puppetizing because this is becoming the new default setup and there will be new releases* machines in the future).

Puppetizing sounds good. Does it need an LVM group or is that redundant in the case of ganeti?

unfortunately T272555 happened and currently releases2002 is down

Thanks for working on this, @Dzahn and @akosiaris! Should I add some puppet to ensure /dev/vdb1 is mounted at /var/lib/docker?

Thank to Alex for fixing the subtask!

I rebooted releases1002 as well and .. it had the exact same issue but now I knew the fix and applied it (ens5->ens6 in /etc/network/interfaces).

Then created new partitions with fdisk, created ext4 file system, mounted on /srv/docker.

Finally edited /etc/fstab to make sure it survives reboots and confirmed by rebooting releases2002 one more time. Both networking issue gone and disk got auto-mounted.

Calling it resolved. No, you don't need to worry about puppet, /etc/fstab isn't managed by it normally and this was a one-time action and next time we replace these VMs we will just make bigger disks when creating them.

If this works for you we can call it resolved.

/dev/vdb1       147G   61M  140G   1% /srv/docker

same on both machines

Awesome. Thanks again!