Page MenuHomePhabricator

Investigate buildkitd instances as image builders for GitLab
Closed, ResolvedPublic

Description

Let's set up a small k8s deployment of buildkitd instances to evaluate its use with GitLab for building images.

buildkitd pros/cons:

Pros:

  • Already packaged into a container and exercised in k8s
  • Using a separate daemon container saves build clients from having to know registry credentials.
  • Supported by Docker
  • Has some GC options. Not fully studied yet. (check buildctld --help output)
  • Supports exporting/importing cache layers to/from a registry. FIXME: Figure out what this "inline" business is.

Cons:

  • The daemon API needs to be studied to see what operations it allows (beyond what buildctl exposes)
  • For mTLs, clients (aka gitlab runners) will require a secret.
  • build commands may not be isolated from the buildkitd container (due to the requirement of using --oci-worker-no-process-sandbox for unprivileged rootless mode):

    "Note that --oci-worker-no-process-sandbox allows build executor containers to kill (and potentially ptrace depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container." xref: https://github.com/moby/buildkit/blob/master/docs/rootless.md#about---oci-worker-no-process-sandbox

    I verified that I can add "RUN pkill buildkitd" to disrupt the buildkitd container.

    I was not able to strace buildkitd though: strace: attach: ptrace(PTRACE_SEIZE, 27): Operation not permitted

    Tested with kernel: 5.4.0-107-generic

    Test stuff is in https://gitlab.wikimedia.org/dancy/builtkitd-abuse

Event Timeline

dduvall changed the task status from Open to In Progress.May 6 2022, 5:20 PM
dduvall triaged this task as Medium priority.
dduvall created this task.

I was able to get a buildkitd cluster working in conjunction with a k8s gitlab runner on Digital Ocean today. Here's the TL;DR. I will update the task in more detail on Monday.

  1. Created a new DO project called "buildkitd-eval" and a new k8s cluster by the same name in that project.
  2. Used the rootless example k8s configuration for buildkit (with some tweaks) to spin up a 2 pod deployment/replicaset and service in the k8s cluster.
  3. Provisioned a gitlab-runner in the same k8s cluster (separate namespace, however) to connect to our GitLab instance. (This is necessary to grant access to the buildkitd instances without making them publicly accessible. For a long-term solution, we'd probably want to have the buildkitd service endpoint accessible to all general runners.)
  4. Registered the runner in the releng group and tagged it "buildkitd" to restrict its use to specific pipelines.
  5. Moved the blubber project from my namespace into releng and added a .gitlab-ci.yaml file with the following contents:
stages:
  - build

build-image:
  stage: build
  image:
    name: moby/buildkit
    entrypoint: [ /bin/sh, -c ]
  tags: [ buildkitd ]
  script:
    - |-
      buildctl --addr tcp://buildkitd.default.svc.cluster.local:1234 \
        --tlscacert "$BUILDKITD_EVAL_CLIENT_CA" \
        --tlscert "$BUILDKITD_EVAL_CLIENT_CERT" \
        --tlskey "$BUILDKITD_EVAL_CLIENT_KEY" \
        build --frontend gateway.v0 \
        --opt source=docker-registry.wikimedia.org/wikimedia/blubber-buildkit:0.9.0 \
        --local context=. \
        --local dockerfile=. \
        --opt filename=.pipeline/blubber.yaml \
        --opt variant=test

As you can see there are a few BUILDKITD_* variables referenced by the script. These are bound at the releng group level as File type variables and contain the client mTLS ca/cert/key generated while settings up the buildkitd instances.

There's more to go over here and much more to test (caching, registry access, etc.) so I'll continue with this next week. I just wanted to get a jump start on it since it's Friday and there are no interruptions... *cough* meetings.

I've set up a personal repo with the manifests, etc. used to set up my evaluation environment. They were applied against a fresh Digital Ocean k8s cluster.

https://gitlab.wikimedia.org/dduvall/gitlab-buildkitd-eval

Change 835162 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab_runner: enable unprivileged_userns_clone in WMCS

https://gerrit.wikimedia.org/r/835162

Change 835162 merged by Dzahn:

[operations/puppet@production] gitlab_runner: enable unprivileged_userns_clone in WMCS

https://gerrit.wikimedia.org/r/835162