Page MenuHomePhabricator

Modify conda-analytics CI pipeline to use a custom gitlab runner that can run docker
Closed, ResolvedPublic3 Estimated Story Points

Description

As part of T321088, we have modified the development Dockerfile of conda-analytics.

In this task we should change the way CI is done for conda-analytics. @Antoine_Quhen suggests we add:

  • a script to generate+validate the conda env lock file
  • a Dockerfile in charge of generating the deb
  • a GitLab ci custom runner running docker in docker
  • a very simple GitLab ci pipeline in workflow utils, only responsible of:
    • launching the docker build
    • passing the basic params (package name and version)
    • fetching the resulting deb file from the docker container
    • uploading the deb to the registry

Event Timeline

Thanks for taking over the unmerged MR.

After merge, this is where to update:
https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/blob/main/.gitlab-ci.yml#L3

Also, after working on the next version of the Airflow deb package, I don't view this CI as the way to go anymore. I would suggest:

  • a script to generate+validate the conda env lock file
  • a Dockerfile in charge of generating the deb
  • a GitLab ci custom runner running docker in docker
  • a very simple GitLab ci pipeline in workflow utils, only responsible of:
    • launching the docker build
    • passing the basic params (package name and version)
    • fetching the resulting deb file from the docker container
    • uploading the deb to the registry

I think a little inter-repo duplication within the Dockerfiles would be better than over-parameterization of the ci.

What do you think?

@Antoine_Quhen your points above all makes sense to me, especially keeping all the deb build complexity inside the Dockerfile since they appear to always be custom per project.

However, since T295045 seems to be far away, it looks like I will have to go ahead and modify these scripts. Or am I wrong to assume T295045 is far away? ( @BTullis? )

xcollazo changed the task status from Open to In Progress.Oct 27 2022, 7:29 PM

Ok, I was confused since I though that we needed T295045 for me to be able to do what you suggests @Antoine_Quhen, but @BTullis just cleared this up over in Slack:

As I understand it, the GitLab team would be ok for us to run our own gitlab-runner, which would be able to run docker-in-docker and would therefore be able to build packages and make them available.
However, what we can't currently do is run this new runner in the production network, because it's a security smell.
We could run a gitlab-runner in WMCS for this purpose, because that's kind running it in a sandbox. (i e. It's a bit like running it in AWS or similar.)
So we can just go ahead and do this if we want to. The ticket that I created about 'allow a shared, protected runner...' is about running it in production and there are several ideas there if how we might get around the security issue, but we don't need to do this in other to get the airflow packages or any packages building using docker in docker.

So I'll definitely follow your suggested approach. Will update this ticket description.

xcollazo renamed this task from Migrate changes to conda-analytics' Dockerfile to CI pipeline at workflow_utils to Modify conda-analytics CI pipeline to use a custom gitlab runner that can run docker.Oct 27 2022, 8:40 PM
xcollazo updated the task description. (Show Details)

Had some fun with https://horizon.wikimedia.org/ and its puppet integration. We now have an instance with docker were we can test out this whole idea:

xcollazo@gitlab-docker-runner:~$ hostname -f
gitlab-docker-runner.analytics.eqiad1.wikimedia.cloud
xcollazo@gitlab-docker-runner:~$ docker --version
Docker version 20.10.5+dfsg1, build 55c4c88

Runner now registered at https://gitlab.wikimedia.org/repos/data-engineering/conda-analytics/-/runners/338

We will probably want to puppetize this later, but its good enough for now:

#To setup gitlab runner:

curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash

apt-cache madison gitlab-runner

# 15.2.2 is closest available for our version of 15.2.5
sudo apt-get install gitlab-runner=15.2.2


sudo gitlab-runner register

Enter the GitLab instance URL (for example, https://gitlab.com/):
https://gitlab.wikimedia.org/
Enter the registration token:
***********
Enter a description for the runner:
[gitlab-docker-runner]: gitlab-docker-runner.analytics.eqiad1.wikimedia.cloud
Enter tags for the runner (comma-separated):
docker,analytics,data-engineering
Enter optional maintenance note for the runner:
First attempt at a custom runner for building with docker on docker.
Registering runner... succeeded                     runner=******
Enter an executor: parallels, docker+machine, kubernetes, ssh, virtualbox, docker-ssh+machine, custom, docker, docker-ssh, shell:
docker
Enter the default Docker image (for example, ruby:2.7):
docker-registry.wikimedia.org/buster:latest
Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!
 
Configuration (with the authentication token) was saved in "/etc/gitlab-runner/config.toml"

A manual conda-analytics build failed with out of space. The instance default of 20GB is not going to cut it:

 ---> 1a6fce9fa544
Step 42/58 : RUN mv "${WORK_DIR}/pkgs" "${DEBIAN_CONDA_ENV_PATH}/pkgs"
 ---> Running in d5303aaf11cf
Error processing tar file(exit status 1): write /srv/conda-analytics/debian/conda-analytics/opt/conda-analytics/pkgs/python-3.9.12-h12debd9_0/bin/python3.9: no space left on device

Thus, created volume gitlab-docker-runner-workspace with 60GB and attached it to gitlab-docker-runner.analytics.eqiad1.wikimedia.cloud.

Followed https://phoenixnap.com/kb/linux-format-disk to format and mount the extra volume:

lsblk -f
sudo mkfs -t ext4 /dev/sda
sudo mkdir -p /mnt/docker-scratch
sudo mount -t auto /dev/sda /mnt/docker-scratch
lsblk -f
NAME    FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINT
sda     ext4   1.0         87a49e46-b1d5-4f45-98f8-94fc2847c9a9   55.7G     0% /mnt/docker-scratch
sdb                                                                            
├─sdb1  ext4   1.0         755f2444-3861-4645-bf30-1ef4863ace90   15.7G    17% /
├─sdb14                                                                        
└─sdb15 vfat   FAT16       FCD5-F6A0                             117.8M     5% /boot/efi

Now prune everything:

docker system prune --all

And tell docker to use new volume from now on:

sudo systemctl stop docker
sudo systemctl stop docker.socket
sudo systemctl stop containerd

sudo mv /var/lib/docker /mnt/docker-scratch/

sudo vim /etc/docker/daemon.json
{
  "data-root": "/mnt/docker-scratch/docker"
}

This worked. So I commited this change to the instance's Hiera Config:

docker::configuration::settings:
  data-root: /mnt/docker-scratch/docker

This is the specific docker configuration that we want the runner to have:

[[runners]]
  executor = "docker"
  [runners.docker]
    image = "docker-registry.wikimedia.org/wikimedia-buster:latest"
    # instead of docker on docker, which create "child" containers that require priviledged mode,
    # here we mount the docker socket so that we can launch "sibling" docker containers
    # see here for why this is more sound: https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/
    volumes = ["/var/run/docker.sock:/var/run/docker.sock"]

After merging it with the existing settings at /etc/gitlab-runner/config.toml, we have:

concurrent = 1
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gitlab-docker-runner"
  url = "https://gitlab.wikimedia.org/"
  token = "*************"
  executor = "docker"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    tls_verify = false
    image = "docker-registry.wikimedia.org/wikimedia-buster:latest"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    # instead of docker on docker, which create "child" containers that require priviledged mode,
    # here we mount the docker socket so that we can launch "sibling" docker containers
    # see here for why this is more sound: https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/
    volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
    shm_size = 0

TODO: This should probably be in puppet.

EChetty set the point value for this task to 3.Nov 2 2022, 4:54 PM

( Created T322251 for eventual follow up to properly puppetize the manual steps we took in this task. )

The attached volume gitlab-docker-runner-workspace got mangled up, perhaps because of a recent live-migration.

Detached and reattached via UI. It mounted on sdb:

lsblk -f
NAME    FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINT
sda                                                                            
├─sda1  ext4   1.0         755f2444-3861-4645-bf30-1ef4863ace90   14.1G    25% /
├─sda14                                                                        
└─sda15 vfat   FAT16       FCD5-F6A0                             113.1M     9% /boot/efi
sdb     ext4   1.0         7df078fa-904e-4914-a4d7-8eebed2ff942

So now:

sudo mkfs -t ext4 /dev/sdb
sudo mkdir -p /mnt/docker-scratch
sudo mount -t auto /dev/sdb /mnt/docker-scratch

lsblk -f
NAME    FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINT
sda                                                                            
├─sda1  ext4   1.0         755f2444-3861-4645-bf30-1ef4863ace90   14.1G    25% /
├─sda14                                                                        
└─sda15 vfat   FAT16       FCD5-F6A0                             113.1M     9% /boot/efi
sdb     ext4   1.0         70f094ce-c43d-46bd-81e3-d474e1d33829   55.7G     0% /mnt/docker-scratch