Page MenuHomePhabricator

Set up mirror of the docker hub registry for gitlab-runners
Closed, ResolvedPublic

Description

As demonstrated in T329216 it is easy to unintentionally cause too many requests to be sent to the official docker registry, resulting in failed CI jobs for many users.

Proposal:

  • In all places where gitlab-runners live (wmcs, trusted, releng cloud runners), set up a registry proxy to cache access to the Docker Hub registry.
  • The standard docker registry implementation already has built-in support for this type of proxy caching. https://docs.docker.com/registry/configuration/#proxy
  • This proxy can be equipped with credentials to isolate its Docker Hub pull activity accounting.
  • Configure (or provide advice on how to configure) image-downloading programs to check the mirror before accessing to official registry.
    • buildkitd
      • gitlab-cloud-runners
      • wmcs/trusted runners
    • Kubernetes runtime (using a Pod admission controller) (gitlab-cloud-runners)
    • dockerd
    • ??
  • Block direct access to docker.io registry from all runners This is not necessary if we ensure that the mirror uses its own set of credentials.

Notes:

Docker Hub limits the number of Docker image downloads (“pulls”) based on the account type of the user pulling the image. Pull rates limits are based on individual IP address. For anonymous users, the rate limit is set to 100 pulls per 6 hours per IP address. For authenticated users, it’s 200 pulls per 6 hour period. Users with a paid Docker subscription get up to 5000 pulls per day. If you require a higher number of pulls, you can also purchase an Enhanced Service Account add-on.
...
A pull request is defined as up to two GET requests on registry manifest URLs (/v2/*/manifests/*).
A normal image pull makes a single manifest request.
A pull request for a multi-arch image makes two manifest requests.
HEAD requests aren’t counted.

xref: https://docs.docker.com/docker-hub/download-rate-limit

Event Timeline

In practice I have never used gitlab's registry proxy feature., and it appears that you may need to address the cached images differently?
This could be annoying and a regular registry pull through mirror might be more appealing?

This is fairly easy to run

docker run -d -p 6000:5000 \
    -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
    -e REGISTRY_PROXY_USERNAME=<TODO-USERNAME> \
    -e REGISTRY_PROXY_PASSWORD=<TODO-PASSWORD/KEY> \
    --restart always \
    --name registry registry:2

And configure

# NOTE: If sudo doesn't work for the file change you may need to sudo su, and then run the echo as root...
echo "{\"registry-mirrors\": [\"http://IP:6000\"]}" > /etc/docker/daemon.json
service docker restart
This comment was removed by bd808.
This comment was removed by bd808.

If possible the proxy should be configured to use a named Docker account when pulling from the Docker upstream repo. This will do two things: 1) raise the pull limit for to 200/6 hours, and 2) isolate the proxy's pulls from the default 100/6 hours rate limit on the Cloud VPS egress IP.

Thanks for the research @bd808 and @Addshore . @bd808 I may hit you up for those credentials soon.

Dzahn renamed this task from Disallow direct access to the docker.io registry from gitlab runners to Disallow direct access to the docker.io registry from gitlab runners (setup a mirror of docker hub).Feb 22 2023, 4:09 PM
Dzahn added a project: collaboration-services.
Dzahn added a project: serviceops-radar.
dancy renamed this task from Disallow direct access to the docker.io registry from gitlab runners (setup a mirror of docker hub) to Set up mirror of the docker hub registry for gitlab-runners.Feb 27 2023, 9:11 PM
dancy claimed this task.
dancy triaged this task as Medium priority.
dancy updated the task description. (Show Details)

One of the WMCS Shared Runners runner-29 has a dedicated disk for storing the mirrored dockerhub images now:

/dev/sdb1       492G   28K  467G   1% /var/lib/docker-registry

I used the unused volumes from T328283 and created a single bigger volume for that. That should give us some space for storing images.

Next step is to add some puppet code to run the registry proxy container. This should be configurable by a feature/ensure flag, as the proxy is not needed on the Trusted Runners and not all WMCS runners.

Change 894100 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab_runner: add optional docker registry proxy to runners

https://gerrit.wikimedia.org/r/894100

Change 894100 merged by Jelto:

[operations/puppet@production] gitlab_runner: add optional docker registry proxy to runners

https://gerrit.wikimedia.org/r/894100

Change 896316 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab_runner: fix docker run command in registry service, fix hiera

https://gerrit.wikimedia.org/r/896316

Change 896316 merged by Jelto:

[operations/puppet@production] gitlab_runner: fix docker run command in registry service, fix hiera

https://gerrit.wikimedia.org/r/896316

A docker registry container is running on runner-1029 in WMCS now. A quick test with enabling the pull-through cache seemed to work. With enabled pull-through cache and a docker pull, the image data ends up in /var/lib/docker-registry.

runner-1029:~$ sudo docker ps
CONTAINER ID   IMAGE        COMMAND                  CREATED          STATUS          PORTS                    NAMES
c56583379804   registry:2   "/entrypoint.sh /etc…"   7 minutes ago    Up 7 minutes    0.0.0.0:5000->5000/tcp   registry

runner-1029:~$ sudo docker pull golang:bullseye

runner-1029:~$ ls /var/lib/docker-registry/
docker  lost+found  scheduler-state.json

runner-1029:~$ cat /var/lib/docker-registry/scheduler-state.json 
{"library/golang@sha256:...":}
...

Before enabling the proxy/pull-through cache for all WMCS runners we need to configure tls properly, so other runners can speak to the proxy over https.

I'm resolving this task as "done enough for now". docker-hub-mirror has been deployed to gitlab-cloud-runners along with a pod admisssion webhook and buildkitd configuration and has been working fine there.