In order to facilitate the building and running of MediaWiki images and containers on releases1002 and releases2002, we'll want an additional volume dedicated to Docker's data store (/var/lib/docker by default though this can be changed with profile::docker::settings in puppet/hiera). Otherwise, we risk filling or needing to resize / down the road.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T198901 Migrate production services to kubernetes using the pipeline | |||
Open | None | T238770 Deploy MediaWiki to Wikimedia production in containers | |||
Open | None | T238771 Get production MW-land images built and published | |||
Resolved | • dduvall | T271472 Determine whether PipelineLib can run on releases-jenkins.wikimedia.org | |||
Resolved | • dduvall | T271477 Define a PipelineLib based MW image build job on releases-jenkins.wikimedia.org | |||
Resolved | • dduvall | T272092 Request volume for Docker images and container filesystems on releases machines |
Event Timeline
Hmm... let's see.
This is going to be primarily for building MW images, and we're looking at (worst case scenario where there is no efficient layer caching) around 1.2G per MW version with two versions for each deployable image (plus a little for config). That's 2.4G per image which could potentially be generated per commit on the wmf/* branches. Let's make it 2.5G per image w/ config.
Here are the number of branch merges for past wmf/1.36.* branches:
$ gerrit query 'branch:^wmf/1\.36.* is:merged' --format=json | jq -s --raw-output 'group_by(.branch)[] | "\(.[0].branch)\t\(length)"' wmf/1.36.0-wmf.1 13 wmf/1.36.0-wmf.10 22 wmf/1.36.0-wmf.11 13 wmf/1.36.0-wmf.12 3 wmf/1.36.0-wmf.13 18 wmf/1.36.0-wmf.14 18 wmf/1.36.0-wmf.16 27 wmf/1.36.0-wmf.18 12 wmf/1.36.0-wmf.2 11 wmf/1.36.0-wmf.20 12 wmf/1.36.0-wmf.21 14 wmf/1.36.0-wmf.22 15 wmf/1.36.0-wmf.25 8 wmf/1.36.0-wmf.26 6 wmf/1.36.0-wmf.3 13 wmf/1.36.0-wmf.4 8 wmf/1.36.0-wmf.5 8 wmf/1.36.0-wmf.6 6 wmf/1.36.0-wmf.8 11 wmf/1.36.0-wmf.9 17
The 90th percentile—which might be a good figure for capacity estimation—is 18, and we'll want to keep image/layer caches around for at least two weeks to match the train cadence and possibility for quickly building images for rollback (so previous week/branch).
So 2.5G * (18 * 2) = 90G + [nebulous amount of space needed for running container filesystems]... maybe 150G is sufficient? This is a bit hand wavy. Sorry. :)
Yes, it is. Thank you, I just didn't get to it. Let me take the ticket so I don't forget. I'll try to add it tomorrow and assign it back.
Mentioned in SAL (#wikimedia-operations) [2021-01-20T18:22:10Z] <mutante> ganeti - creating 105G virtual harddisk and adding to releases1002 for T272092
Mentioned in SAL (#wikimedia-operations) [2021-01-20T18:24:34Z] <mutante> ganeti - creating 150G virtual hard disk and adding it to releases2002 for T272092
New disks have been created as above. Now we need to restart the VMs and mount them (manually, unless it's worth puppetizing because this is becoming the new default setup and there will be new releases* machines in the future).
Puppetizing sounds good. Does it need an LVM group or is that redundant in the case of ganeti?
Thanks for working on this, @Dzahn and @akosiaris! Should I add some puppet to ensure /dev/vdb1 is mounted at /var/lib/docker?
Thank to Alex for fixing the subtask!
I rebooted releases1002 as well and .. it had the exact same issue but now I knew the fix and applied it (ens5->ens6 in /etc/network/interfaces).
Then created new partitions with fdisk, created ext4 file system, mounted on /srv/docker.
Finally edited /etc/fstab to make sure it survives reboots and confirmed by rebooting releases2002 one more time. Both networking issue gone and disk got auto-mounted.
Calling it resolved. No, you don't need to worry about puppet, /etc/fstab isn't managed by it normally and this was a one-time action and next time we replace these VMs we will just make bigger disks when creating them.
If this works for you we can call it resolved.
/dev/vdb1 147G 61M 140G 1% /srv/docker
same on both machines