Page MenuHomePhabricator

contint1001 store docker images on separate partition or disk
Closed, ResolvedPublic

Description

Currently contint1001 stores all docker images on the root partition (which is only 50GB). It seems like this could cause trouble if we build one too many docker images and fill the whole disk (which seems like an easy thing to do in the current setup).

Not sure how difficult it would be to add storage to this machine, or reallocate the existing storage, filing this task to find out.

On T178663#3699074 @hashar wrote:

Looks like profile::docker::storage has the logic to setup a partition with parameters:

# list of physical volumes to use.
$physical_volumes = hiera('profile::docker::storage::physical_volumes'),
# Volume group to substitute.
$vg_to_remove = hiera('profile::docker::storage::vg_to_remove'),

It seems to create a new volume group docker with logical volumes data and metadata.

profile::docker::storage::physical_volumes would be the physical volume.

contint1001 has a 1TB disk and all the physical volume / volume group is allocated:

# pvdisplay
  --- Physical volume ---
  PV Name               /dev/md2
  VG Name               contint1001-vg
  PV Size               883.89 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
# vgdisplay 
  --- Volume group ---
  VG Name               contint1001-vg
  Format                lvm2
  VG Status             resizable
  Cur LV                1
  VG Size               883.89 GiB
root@contint1001:~# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/contint1001-vg/data
  LV Name                data
  VG Name                contint1001-vg
  LV Size                883.89 GiB

Seems to me we would have to shrink the logical volume /dev/contint1001-vg/data and the volume group contint1001-vg.


Usage as of January 25th 2019

$ df -h / /srv
Filesystem                        Size  Used Avail Use% Mounted on
/dev/md0                           46G   39G  5.0G  89% /
/dev/mapper/contint1001--vg-data  870G  544G  283G  66% /srv

/ has Docker images (via /var/lib/docker) which is the concern: docker can fill the root partition.

/srv has Jenkins build results (large), zuul-merge repositories (~ 25GB), integration.wikimedia.org docroot (small)

We would want to shrink the volume group and create a new one for Docker images which would be at /srv/docker.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

contint1001 was purchased on T130738, and has dual 1TB SATA disks.

We generally don't store anything in the / root partition, but toss all our things in /srv, which is a larger partition.

robh@contint1001:~$ df -h
Filesystem                        Size  Used Avail Use% Mounted on
udev                               10M     0   10M   0% /dev
tmpfs                              13G  1.4G   12G  11% /run
/dev/md0                           46G   33G   11G  76% /
tmpfs                              32G     0   32G   0% /dev/shm
tmpfs                             5.0M     0  5.0M   0% /run/lock
tmpfs                              32G     0   32G   0% /sys/fs/cgroup

@thcipriani: I'd recommend changes to your docker storage configuration to store in the /srv directory (and larger partition.)

This happened today. Was about to make a ticket for it and found this.

17:38 < icinga-wm> PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: / 2510 MB (5% inode=62%)
17:39 < mutante> !log contint1001 - apt-get clean - disk space low

17:42 < Krinkle> mutante: thanks, I guess my docker-pgk rebuilding is contributing somehow
17:42 < Krinkle> it's rebuilding a lot of them, due to a change in the ci-stretch base image.
17:42 < Krinkle> actually, no it isn't. Nevermind.

17:52 < mutante> Krinkle: /var/lib/docker/overlay2 is like 31G of 46 total

17:53 < Krinkle> mutante: The change was pretty small, so I guess it's just slow build up
17:53 * Krinkle tries to find graphs
17:54 < Krinkle> we should probably have a strategy for cleaning up older version of Ci images that aren't used.
17:54 < Krinkle> if we don't already have soemthing for that - assuming that's where they are stored, I don't know.
17:55 < Krinkle> https://grafana.wikimedia.org/d/000000377/host-overview?panelId=12&fullscreen&orgId=1&var-server=contint1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=ci&from=1546904126553&to=1547852116992

17:57 < mutante> !log contint1001 - moved zuul logs from 2018 and gzipped zuul logs from /var/log/zuul to /srv/logs/zuul to free disk space on /

18:00 < icinga-wm> RECOVERY - Disk space on contint1001 is OK: DISK OK

18:00 < mutante> !log contint1001 - gzipping more files in /var/log/zuul/

18:00 < mutante> Krinkle: yep, slow build up. happened before i think

updated to integrate my comments from T178663#3699074

Could use /srv to be shrinked a bit and a new partition for Docker images at /srv/docker?

Could use /srv to be shrinked a bit and a new partition for Docker images at /srv/docker?

Quoting Tyler from the other ticket though "We don't want to resize /srv/ as that's already in use for zuul-merger (and is currently using > half the available disk space). Ideally we'd be able to add a disk to this machine just to store images."

Let's ask dcops instead and request a new disk to be added. ?

Let's ask dcops instead and request a new disk to be added. ?

@RobH (since you chimed in earlier) is it possible to add an additional disk to contint1001? Ideally, I'd like to avoid using /srv since zuul-merger is already there and using 65% of the storage, and we've only just started the pipeline project (building docker images on contint1001) -- probably more disk space usage in future for docker. Looks like the Dell PowerEdge R430 has 4 drive bays(?).

Since that is recurring. Can we check whether we can add a couple disks to the machine? I guess 256G would be sufficient.

An alternative is to shrink the existing volume group for /srv. It is reasonably busy, but we can look at optimizing the current disk usage (keep less artifacts, compress logs etc).

just assigning for the question in the 2 comments above

Icinga alerting again:

contint1001 - Disk space
CRITICAL 2019-04-24 17:29:39 0d 1h 10m 17s 3/3 DISK CRITICAL - free space: / 2644 MB (5% inode=65%):

[#wikimedia-oper] !log contint1001 - apt-get clean for 1% more disk space

Dzahn raised the priority of this task from Medium to High.Apr 24 2019, 5:38 PM

Mentioned in SAL (#wikimedia-operations) [2019-04-24T17:52:38Z] <mutante> contint1001 - for logfile in $(find /var/log/zuul/ ! -name "*.gz"); do gzip $logfile; done to get more disk space (T207707)

Dzahn lowered the priority of this task from High to Medium.Apr 24 2019, 5:57 PM

gzipping all files in /var/log/zuul that were not already gzipped saved almost 10G. usage of / back to 79% from 95%

Eventually I have unzipped them, the reason is the log rotation is handled by python logging not by logrotate. So that when we gzip the file, logging does not delete the old .gz files :-/

Mentioned in SAL (#wikimedia-operations) [2019-05-14T17:32:46Z] <mutante> contint1001 - mkdir /srv/zuul-logs ; mv /var/log/zuul/debug.log* /srv/zuul-logs/ to prevent CI running out of disk again (T207707)

As before /var/log/zuul is many Gigabytes and a large percentage of / and debug logging is enabled.

Linked previous duplicate tickets. Raising priority.

Pinging @RobH for T207707#4937008

@hashar Because of T207707#5159292 this time i made a /srv/zuul-logs and moved the debug logs there. See above. Also do we really need (that many) debug logs on a constant basis?

Dzahn raised the priority of this task from Medium to High.May 14 2019, 5:36 PM

Let's ask dcops instead and request a new disk to be added. ?

@RobH (since you chimed in earlier) is it possible to add an additional disk to contint1001? Ideally, I'd like to avoid using /srv since zuul-merger is already there and using 65% of the storage, and we've only just started the pipeline project (building docker images on contint1001) -- probably more disk space usage in future for docker. Looks like the Dell PowerEdge R430 has 4 drive bays(?).

Cost of this has to be discussed, and it cannot be done on this public task.

I'll create a private sub task for pricing disucssion for adding disks to this system.

RobH mentioned this in Unknown Object (Task).May 14 2019, 6:24 PM

Mentioned in SAL (#wikimedia-operations) [2019-05-20T10:25:24Z] <hashar> contint1001: docker image prune -f | Total reclaimed space: 7.115GB | T207707

greg changed the task status from Open to Stalled.May 27 2019, 5:18 PM
greg subscribed.

stalled until the disks are installed

Cmjohnson closed subtask Unknown Object (Task) as Resolved.May 30 2019, 2:36 PM
Cmjohnson subscribed.

@greg @RobH I am just plugging these disks into the server correct? nothing else? this will not require downtime afaik.

Mentioned in SAL (#wikimedia-operations) [2019-05-30T22:53:41Z] <marxarelli> deleting stale docker images from contint1001, cc: T207707 T219850

@greg @RobH I am just plugging these disks into the server correct? nothing else? this will not require downtime afaik.

right, the repartitioning will be done separately afaik.

@greg the disks have been added and assigned to you

greg changed the task status from Stalled to Open.May 31 2019, 6:15 PM
greg removed greg as the assignee of this task.
greg removed a project: User-greg.

The new disks can be shown as sdc and sdd.

Currently I think we have 3 RAID 1 arrays, with LVM on the largest one (md2):

Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors

Disk /dev/md0: 46.5 GiB, 49965694976 bytes, 97589248 sectors
Disk /dev/md1: 953.4 MiB, 999751680 bytes, 1952640 sectors
Disk /dev/md2: 883.9 GiB, 949069283328 bytes, 1853650944 sectors

Disk /dev/mapper/contint1001--vg-data: 883.9 GiB, 949066137600 bytes, 1853644800 sectors

Disk /dev/sdc: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors

I don't know what are ops best practices for disks. I guess we will need SRE to create a new RAID 1 array over the two disks, create a new LVM volume group and then we can do some partitioning.

For Docker I guess we can start with a 500GB partition on the new disks? Then mount that to /srv/docker and change its config to point there.

The new disks can be shown as sdc and sdd.

Currently I think we have 3 RAID 1 arrays, with LVM on the largest one (md2):

Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors

Disk /dev/md0: 46.5 GiB, 49965694976 bytes, 97589248 sectors
Disk /dev/md1: 953.4 MiB, 999751680 bytes, 1952640 sectors
Disk /dev/md2: 883.9 GiB, 949069283328 bytes, 1853650944 sectors

Disk /dev/mapper/contint1001--vg-data: 883.9 GiB, 949066137600 bytes, 1853644800 sectors

Disk /dev/sdc: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors

I don't know what are ops best practices for disks. I guess we will need SRE to create a new RAID 1 array over the two disks, create a new LVM volume group and then we can do some partitioning.

For Docker I guess we can start with a 500GB partition on the new disks? Then mount that to /srv/docker and change its config to point there.

Sounds like we need serviceops help here.

/me adds them to task.

From a discussion with @mmodell, we might need an extra partition soonish as well. So maybe for the Docker images we can go with:

  • RAID1 over the new disks /dev/sdc and /dev/sdd
  • a LVM volume group
  • a 250 G logical volume for the Docker images

Then format the partition to whatever filesystem best fit Docker if any.

To repopulate the image, we can probably just pull them from the docker-registry. Though docker-pkg is supposedly doing it automatically on build now.

I had a quick chat this morning with various SRE people. Theoretically disk setup is done by DC ops, with SRE infrastructure foundation designing the actual layout. But CI falls under serviceops since well it has a lot of legacy stuff, timezone, and there is Docker :]

Giuseppe kindly offered to review the task and give some advice. We will proceed from that.

The task is not so urgent that needs to be deal with immediately. Would be nice to have some progress on next week though.

Update: it needs more discussion among SRE team :-]

Mentioned in SAL (#wikimedia-operations) [2019-07-02T20:20:04Z] <mutante> contint1001 - temp installing parted for labeling new disks sdc and sdd for raid for docker images (T207707)

RAID1 over the new disks /dev/sdc and /dev/sdd

apt-get install parted
parted /dev/sdc mklabel msdos
parted /dev/sdd mklabel msdos
fdisk /dev/sdc (n -> p -> 1 -> t -> fd -> p -> w)
fdisk /dev/sdd (n -> p -> 1 -> t -> fd -> p -> w)
mdadm --examine /dev/sdc /dev/sdd

/dev/sdc:
   MBR Magic : aa55
Partition[0] :   1953523120 sectors at         2048 (type fd)
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   1953523120 sectors at         2048 (type fd)

mdadm --create /dev/md3 --level=mirror --raid-devices=2 /dev/sdc1 /dev/sdd1
(mdadm: array /dev/md3 started.)

root@contint1001:~# cat /proc/mdstat 
Personalities : [raid1] 
md3 : active raid1 sdd1[1] sdc1[0]
      976630464 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  1.2% (11930816/976630464) finish=119.9min speed=134060K/sec
      bitmap: 8/8 pages [32KB], 65536KB chunk
 mdadm --detail /dev/md3
/dev/md3:
        Version : 1.2
  Creation Time : Tue Jul  2 20:32:04 2019
     Raid Level : raid1
     Array Size : 976630464 (931.39 GiB 1000.07 GB)
  Used Dev Size : 976630464 (931.39 GiB 1000.07 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Jul  2 20:36:10 2019
          State : clean, resyncing 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

  Resync Status : 3% complete

           Name : contint1001:3  (local to host contint1001)
           UUID : 90d5d855:2778ce20:de3aa86e:9dfe6119
         Events : 50

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
  • a LVM volume group

pvcreate /dev/md3
vgcreate images /dev/md3

vgrename images contint1001-data

root@contint1001:~# vgdisplay
  --- Volume group ---
  VG Name               contint1001-data
  System ID             
  Format                lvm2
..
  VG Size               931.38 GiB
  PE Size               4.00 MiB
  Total PE              238434
  Alloc PE / Size       0 / 0   
  Free  PE / Size       238434 / 931.38 GiB
  VG UUID               gGJ8bQ-KTqh-HJWC-DF4X-BPOC-EArF-o1ep56

a 250 G logical volume for the Docker images

lvcreate -L 250G -n docker contint1001-data

Logical volume "docker" created

Then format the partition to whatever filesystem best fit Docker if any.

mkfs.ext4 /dev/mapper/contint1001--data-docker

Then mount that to /srv/docker

I didn't really want to do that because the existing separate device is mounted on /srv.

mkdir /mnt/docker
mount /dev/mapper/contint1001--data-docker /mnt/docker/

@hashar see the above and check out /mnt/docker . what do you think?

root@contint1001:/mnt/docker# df -h
Filesystem                            Size  Used Avail Use% Mounted on
..
/dev/md0                               46G   39G  5.3G  88% /
..
/dev/mapper/contint1001--vg-data      870G  484G  342G  59% /srv
/dev/mapper/contint1001--data-docker  246G   60M  234G   1% /mnt/docker
Dzahn subscribed.

Hi Hashar, at this point i think it makes sense to assign back to to you to check if it seems sane and then for your next step:

and change its config to point there.

Hi Hashar, at this point i think it makes sense to assign back to to you to check if it seems sane and then for your next step:

and change its config to point there.

Looks like we have a couple of options there.

According to docker info we're using the overlay2 driver currently. Which (per the docker overlayfs docs) means that: there is no configuration change that can repoint it. We just bind mount /mnt/docker to /var/lib/docker and (I'm guessing) rsyncing the data from the old /var/lib/docker would probably suffice.

The other option would be to move towards using the devicemapper driver. In this case we would just use the volume group created and modify the docker configuration to create a thin pool there. It seems that the devicemapper driver may give us better performance than using a driver working at the filesystem level. The caveat seems to be, existing images will have to be saved and imported. T178663: Switch CI Docker Storage Driver to its own partition and to use devicemapper is related to this work.

@thcipriani why do we even need to save the images? We don't really care about locally-saved images, do we?

On the other hand, given we're not running production containers on that server, it seems to me it's ok if we use the overlay for now.

@thcipriani why do we even need to save the images? We don't really care about locally-saved images, do we?

We did due to docker-pkg, though it is nowadays supposed to attempt to pull images before attempting to build them.

There are also images from the deployment pipeline, but I think they are rebuild from scratch anyway before publishing. So we should be able to just remove them.

On the other hand, given we're not running production containers on that server, it seems to me it's ok if we use the overlay for now.

I have filled T178663 to switch to devicemapper, that followed a discussion to align contint1001 with production. Then https://docs.docker.com/storage/storagedriver/select-storage-driver/ states overlay2 is the default and we already have it. So we can probably save ourselves from having to migrate to devicemapper yes.


So I think we can just:

  • stick to overlay2
  • reconfigure Docker to use /mnt/docker
  • restart Docker
  • run docker-pkg and verify it actually pulls all images / does not rebuild any
  • archive /var/lib/docker and ultimately delete it

The other option would be to move towards using the devicemapper driver.

https://docs.docker.com/v17.09/engine/userguide/storagedriver/overlayfs-driver/#how-container-reads-and-writes-work-with-overlay-or-overlay2 claims that

"Both overlay2 and overlay drivers are more performant than aufs and devicemapper."

and there is "The devicemapper storage driver is deprecated in Docker Engine 18.09, and will be removed in a future release. It is recommended that users of the devicemapper storage driver migrate to overlay2."

Change 520738 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] contint1001: point Docker data to a different partition

https://gerrit.wikimedia.org/r/520738

Change 520738 merged by Dzahn:
[operations/puppet@production] contint1001: point Docker data to a different partition

https://gerrit.wikimedia.org/r/520738

So I think we can just:

  • stick to overlay2
  • reconfigure Docker to use /mnt/docker
  • restart Docker
  • run docker-pkg and verify it actually pulls all images / does not rebuild any
  • archive /var/lib/docker and ultimately delete it

@Dzahn merged your config patch, I restarted docker.

I'm able to confirm that Deployment Pipeline jobs run fine -- I ran a test job for mobileapps and it was a success.

I ran docker-pkg. There seems to be one broken image in our integration/config repo composer-php56:0.1.5; however, that was evidently the case before moving mount points.

I applied a temporary patch to /etc/zuul/wikimedia:

From 0b4374f1246b451eaf3d9b155156b6fc38a60d74 Mon Sep 17 00:00:00 2001
From: Tyler Cipriani <tcipriani@wikimedia.org>
Date: Mon, 8 Jul 2019 16:00:01 -0600
Subject: [PATCH] thcipriani:test docker-pkg

partially reverts I4109ff5c2ee3f5f04d275cf91fa12d799b70cfdc

only for testing

Change-Id: I0a5e3942d957d579b935307d269b8c2a2ec2b2cc
---
 dockerfiles/composer-php56/changelog | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/dockerfiles/composer-php56/changelog b/dockerfiles/composer-php56/changelog
index b7bfbb1a..9a1d2780 100644
--- a/dockerfiles/composer-php56/changelog
+++ b/dockerfiles/composer-php56/changelog
@@ -1,9 +1,3 @@
-composer-php56 (0.1.5) wikimedia; urgency=medium
-
-  * Rebuild on new version of composer image which installs php-gmp
-
- -- James D. Forrester <jforrester@wikimedia.org>  Thu, 27 Jun 2019 12:39:55 -0700
-
 composer-php56 (0.1.4) wikimedia; urgency=high

   * Rebuild for apt security update
--
2.20.1

And re-ran docker-pkg:

[thcipriani@contint1001 tmp]$ /srv/deployment/docker-pkg/venv/bin/docker-pkg -c /etc/docker-pkg/integration.yaml build /etc/zuul/wikimedia/dockerfiles 
== Step 0: scanning /etc/zuul/wikimedia/dockerfiles == 
Will build the following images:
== Step 1: building images ==
== Step 2: publishing ==
== Build done! ==

Didn't pull any images, but didn't build any either. I think this is the expected behavior.

Will leave /var/lib/docker for the time being in case @hashar wants to have a look around.

Perfect thank you @thcipriani , those tests were exactly the ones I had in mind :-]

For docker-pkg, since the images are already published, there is indeed no need to pull any of them.

I will have a look at composer-php56:0.1.5 failure.

Mentioned in SAL (#wikimedia-releng) [2019-07-09T09:48:58Z] <hashar> contint1001: removing /var/lib/docker , no more needed since we now use /mnt/docker # T207707

on contint1001: added entry to /etc/fstab for /mnt/docker to survive reboots ( /dev/mapper/contint1001--data-docker /mnt/docker ext4 defaults 0 2$)