Page MenuHomePhabricator

Define access to external resources for GitLab CI Runners
Closed, ResolvedPublic

Description

In multiple tasks and discussions the question came up what external resources should be allowed for GitLab CI Runners (T312961, T291978, T295481). External resources is quite generic and this covers multiple areas, which are split up in the following sections. It could make sense to spin off sub-task for individual resources.

General access to public resources (egress traffic)

This is about what outgoing traffic is allowed for Runners. Should Runners be allowed to access internet resources?
Options here could be either fully unrestricted, over the webproxy or disable egress completely.
Currently Shared Runners in WMCS have mostly unrestricted access and Trusted Runners offer egress access over the webproxy.

Public package repositories

CI builds sometimes need additional packages, either for performing CI tasks or for building the artifact. So should Runners be allowed to use common package registries for CI jobs, like pip or npm? Some sources are present/mirrored in WMF infrastructure (like apt repo), some aren't.
Currently all Runners can install packages from public repositories (if available over http/https using the webproxy).

Docker images for CI purposes

Certain CI jobs use pre-build images to perform common tasks like linting, testing or code scans. Should Runners be allowed to run external images for the purpose of certain CI jobs? Please note this is not about base images (next chapter), it's only about what images can be executed during CI jobs to perform certain tasks.

Currently we restrict what images can be executed. The current list contains:

allowed_images = [
  # Everything in Wikimedia registry:
  "docker-registry.wikimedia.org/**/*",
  "docker-registry.discovery.wmnet/**/*",

  # Distributions:
  "centos/*:*",
  "debian:*",
  "fedora:*",
  "opensuse/*:*",
  "ubuntu:*",

  # Language-specific:
  "python:*",
  "ruby:*",
  "rust:*",
  "rustlang/rust:nightly",

  # GitLab upstream - includes security analyzers and terraform images:
  "registry.gitlab.com/gitlab-org/**/*",

see config.toml.
This list is used for both Shared and Trusted Runners. There was some discussion in T312961 of adding additional security scanners which opened the discussion and this task.

Docker base images for building images

What baseimage are allow for building images for wmf/production registry? So what sources should be allowed for directly building artifacts running in production? (base in blubber or FROM field).
Other open questions regarding base images:
Is it possible to restrict this baseimages in buildkitd?
Somehow related docs: https://wikitech.wikimedia.org/wiki/Kubernetes/Images

Difference between Shared and Trusted Runners

Furthermore some of this resources may be different between the different tiers of Runners. Shared Runners could theoretically execute a wider range of images or build non-production images with a wider range of baseimages.
Currently Shared and Trusted Runners have the same access to external resources and Docker images, beside the webproxy. It should be discussed if this is reasonable for the future or if different allow-lists and policies are needed here.

Event Timeline

Follow up from the meeting re:

Images that BuildKit uses for internal operations

Our Blubber BuildKit frontend currently uses BuildKit dockerfile2llb package to convert its Dockerfile output to LLB instructions. This is a current dependency but one we're planning to refactor out after Blubber becomes exclusively a BuildKit interface and not a blubberfile-to-dockerfile transpiler. Looking at the implementation of Dockerfile2LLB I can see just one internal image:

  • docker/dockerfile-copy:v0.1.9@sha256:e8f159d3f00786604b93c675ee2783f8dc194bb565e61ca5788f6a6e9d304061

This image used to dispatch copy operations is overridable via the OverrideCopyImage field of dockerfile2llb.ConvertOpt meaning we could potentially vendor our own version of this image and enforce its use.

However, I think that may be overkill given:

  • The latest version of this image is 4 years old and it's very minimal.
  • The image reference uses a specific digest (the sha256:), and...
  • Container image layers are content addressable and verifiable. Docker verifies the digests when it pulls images, so use of a digest in a reference is as good as using sha256sum to verify whatever binaries it holds.
  • We're planning on removing the dockerfile2llb dependency in blubber so this isn't going to be a long-term concern.

Thoughts?

Current contents of docker/dockerfile-copy:v0.1.9@sha256:e8f159d3f00786604b93c675ee2783f8dc194bb565e61ca5788f6a6e9d304061:

$ docker save docker/dockerfile-copy:v0.1.9@sha256:e8f159d3f00786604b93c675ee2783f8dc194bb565e61ca5788f6a6e9d304061 | tar Oxf - */layer.tar | tar tf -
bin/
bin/gunzip
bin/gzip
bin/tar
dev/
dev/null
etc/
lib/
lib/ld-musl-x86_64.so.1
lib/libc.musl-x86_64.so.1
proc/
tmp/
usr/
usr/bin/
usr/bin/bunzip2
usr/bin/bzcat
usr/bin/bzcmp
usr/bin/bzdiff
usr/bin/bzegrep
usr/bin/bzfgrep
usr/bin/bzgrep
usr/bin/bzip2
usr/bin/bzip2recover
usr/bin/bzless
usr/bin/bzmore
usr/bin/gunzip
usr/bin/gzexe
usr/bin/gzip
usr/bin/lzcat
usr/bin/lzcmp
usr/bin/lzdiff
usr/bin/lzegrep
usr/bin/lzfgrep
usr/bin/lzgrep
usr/bin/lzless
usr/bin/lzma
usr/bin/lzmadec
usr/bin/lzmainfo
usr/bin/lzmore
usr/bin/tar
usr/bin/uncompress
usr/bin/unlzma
usr/bin/unxz
usr/bin/xz
usr/bin/xzcat
usr/bin/xzcmp
usr/bin/xzdec
usr/bin/xzdiff
usr/bin/xzegrep
usr/bin/xzfgrep
usr/bin/xzgrep
usr/bin/xzless
usr/bin/xzmore
usr/bin/zcat
usr/bin/zcmp
usr/bin/zdiff
usr/bin/zegrep
usr/bin/zfgrep
usr/bin/zforce
usr/bin/zgrep
usr/bin/zless
usr/bin/zmore
usr/bin/znew
usr/lib/
usr/lib/liblzma.so.5
usr/lib/liblzma.so.5.2.4
usr/libexec/
usr/libexec/rmt
var/

Change 844434 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab_runner: make allowed_images list configurable in hiera

https://gerrit.wikimedia.org/r/844434

Jelto triaged this task as High priority.

Follow up from the meeting re:

Images that BuildKit uses for internal operations

Our Blubber BuildKit frontend currently uses BuildKit dockerfile2llb package to convert its Dockerfile output to LLB instructions. This is a current dependency but one we're planning to refactor out after Blubber becomes exclusively a BuildKit interface and not a blubberfile-to-dockerfile transpiler. Looking at the implementation of Dockerfile2LLB I can see just one internal image:

  • docker/dockerfile-copy:v0.1.9@sha256:e8f159d3f00786604b93c675ee2783f8dc194bb565e61ca5788f6a6e9d304061

This image used to dispatch copy operations is overridable via the OverrideCopyImage field of dockerfile2llb.ConvertOpt meaning we could potentially vendor our own version of this image and enforce its use.

However, I think that may be overkill given:

  • The latest version of this image is 4 years old and it's very minimal.
  • The image reference uses a specific digest (the sha256:), and...
  • Container image layers are content addressable and verifiable. Docker verifies the digests when it pulls images, so use of a digest in a reference is as good as using sha256sum to verify whatever binaries it holds.
  • We're planning on removing the dockerfile2llb dependency in blubber so this isn't going to be a long-term concern.

Thoughts?

Thanks Dan for the follow up and looking into the buildkit internals. My suggestion would be to create a non-blocking subtask for self-building the docker/dockerfile-copy image. I don't think we strictly need that immediately and can continue with the sha-referenced public one. But if it's not too complicated I'd like to try to build that single image ourselves and set the override. I have some concerns that docker/dockerfile-copy may not get a lot of attention from Docker inc if it was not updated since 4 years. We may have to update this image for some security patches in the future anyways.

I can also take a look at that if you like (once I find the actual code/repo for dockerfile-copy image).

Different allowed_images for Trusted and Shared Runners

Some more follow ups from yesterdays discussion is to use two different allowed_images lists for Shared and Trusted Runners. The Trusted Runners should only be allowed to execute images we control. So production builds are not depending on images and code provided by a third party. Shared Runners can execute also external images, like common Dockerhub images, popular language base images and images provided by gitlab-org. This should reduce friction for initial development and non-production projects and give us more flexibility in what jobs can be executed outside of the Trusted Build process (code scans, lints, ...).

In the change above two separate allowed_images list are implemented. Once this is merged T312961 and T320825 should be unblocked.

Restrict base images for buildkit

We also agreed that we want to keep the policy of allowing wmf base images for production builds only. So public images, like dockerhub, should be forbidden as base images for the Trusted Runner buildkit configuration. This means buildkit/blubber(?) needs the same policy we are using for the current production builds. @dduvall I guess you can explain that better :)

Thanks again all for all the input and the discussion!

Just my 2 cents. I think the attack vectors from that image (I am happy this appears to be the only image) appear few and the surface small (yet).

That being said, that image uses musl as a libc implementation in there and if it is 4 years old we have we have 1 DoS and 2 overflow CVEs[1]. Another lib, liblzma (shipped by xz-utils) has had one this year[2] and it probably is applicable in that image (as it goes at least back to stretch in Debian terms which matches up with those 4 years) as well.

[1] https://www.cvedetails.com/product/39652/Musl-libc-Musl.html?vendor_id=16859
[2] https://security-tracker.debian.org/tracker/CVE-2022-1271

So overall, while the attack surface might not be particularly large and it would probably take some effort to exploit the above, that attack surface will become larger over time. So, while I wouldn't prioritize as High priority, I +1 Jelto's prudence to want to rebuild and ship internally that image.

So overall, while the attack surface might not be particularly large and it would probably take some effort to exploit the above, that attack surface will become larger over time. So, while I wouldn't prioritize as High priority, I +1 Jelto's prudence to want to rebuild and ship internally that image.

Thank you for pointing out those CVEs. In light of that, I hardily +1 the approach as well—building our own version of dockerfile-copy. It should be simple to enforce in the Blubber buildkit frontend, and maybe we won't even need it after refactoring Blubber to use LLB directly.

Change 844434 merged by Jelto:

[operations/puppet@production] gitlab_runner: make allowed_images list configurable in hiera

https://gerrit.wikimedia.org/r/844434

Jelto lowered the priority of this task from High to Medium.Nov 22 2022, 3:39 PM

A short update about the current state of this task. After the last meeting some changes were deployed addressing egress traffic and allowed Docker images on Shared and Trusted Runners. So the following things are defined and implemented:

  • General access to public resources (egress traffic) was defined, Trusted and Shared Runners have egress firewall rules
  • Allowed Docker images for CI purposes are defined, Trusted and Shared Runners have individual allowed_images lists
  • Allowed Docker base images for building images were defined (only wmf registry base images)

What's still open:

  • Implement/research ways to implement allowed Docker base images in buildkitd
  • Define how to deal with public package repositories on Shared and Trusted Runners
  • Define this policies for Cloud Runners. Is it similar to Shared Runners or do we offer different policies here?

I think policy is now that Trusted runners only use internal images; would it be possible therefore to start making an internal copy of registry.gitlab.com/gitlab-org/release-cli, please? It's really useful if wanting to make gitlab releases from CI (which would be handy as and when I want to start making .debs from CI) - e.g. currently being used for https://gitlab.wikimedia.org/repos/data_persistence/wmf-beamer-style releases.

I think policy is now that Trusted runners only use internal images; would it be possible therefore to start making an internal copy of registry.gitlab.com/gitlab-org/release-cli, please? It's really useful if wanting to make gitlab releases from CI (which would be handy as and when I want to start making .debs from CI) - e.g. currently being used for https://gitlab.wikimedia.org/repos/data_persistence/wmf-beamer-style releases.

Yes that's right, Trusted Runners can not run images from external registries. I opened T333161 to address this and import release-cli to our own registry.

Change 965157 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab_runner: block dockerhub on Trusted Runners

https://gerrit.wikimedia.org/r/965157

Change 965157 merged by Jelto:

[operations/puppet@production] gitlab_runner: block dockerhub on Trusted Runners

https://gerrit.wikimedia.org/r/965157

What's still open:

  • Implement/research ways to implement allowed Docker base images in buildkitd

This is done in https://gerrit.wikimedia.org/r/965157. Unfortunately there is no clean way to restrict docker or buildkit to a private registry (see issue). So all docker traffic to Dockerhub is rejected by a firewall rule. If a image uses a dockerhub baseimage on the Trusted Runners, buildkit will get a connect: connection refused error.

  • Define this policies for Cloud Runners. Is it similar to Shared Runners or do we offer different policies here?

The Cloud Runners in Digital Ocean are configured open on purpose. They don't have a image restriction and accept a wider range of jobs (except the trusted build jobs for production). So this is also done.

  • Define how to deal with public package repositories on Shared and Trusted Runners

This is quite a complex topic. Public packages and code come from a wide variety of sources like apt, pip, npm, Github or Debian upstream repos. As far as I know we don't have a technical restriction that block certain sources in Gerrit/Jenkins CI. All public sources can be used. There is just a common understanding to not use unreviewed packages and prefer more trusted sources like apt.
We could try to allow-list each source individually source for the Trusted Runners. But I think that creates quite some overhead and makes transitioning to GitLab for teams harder. So I'm leaning towards the same approach with the Trusted Runners. Allow public packages (except for Dockerhub).

I'm happy about other opinions or feedback either here in the task or as a separate task. If others don't see the need to restrict public packages on the Trusted Runners I'll close the task soon.

I think we are mostly settled about which runners have which kind of access to wmf and external infrastructure. Also the permission to this runners seems to work as expected (default access to cloud Runners, opt-in access to Trusted Runners).

So I compiled a matrix which shows the status quo: https://wikitech.wikimedia.org/wiki/GitLab/Gitlab_Runner#Permission_matrix. This should cover the main differences. I may add one or two more lines.

I'll close the task.