Page MenuHomePhabricator

Refactor Blubber's BuildKit frontend gateway to use LLB directly
Closed, ResolvedPublic

Description

The BuildKit frontend is now the primary interface for using Blubber to build images. The success of this interface and disuse of the blubber CLI and blubberoid microservice has rendered Blubber more than a simple Dockerfile transpiler. In conjunction with BuildKit, Blubber is now an image build tool in its own right.

A logical next step is to refactor Blubber to use the LLB Go API directly and the remove the intermediate Dockerfile representation/transpilation. There are several reasons for this.

  1. Dockerfile is solely an intermediate format at this point and a large/obtuse layer of indirection with implications for developers of Blubber, users, and SRE.
    1. Now it's YAML -> Dockerfile -> LLB. It could be YAML -> LLB with no loss of function.
    2. As a user, you're presented with messages and errors referencing Dockerfile instructions that are only used internally. With LLB, we have full control over progress messages and errors, allowing us to (potentially) display blubber.yaml file/line/xpath-ish information to the user.
    3. As a developer, you often have to understand how user-centric Dockerfile behaves (strangely at times) to get the result you want in config implementation. While the LLB docs do need improvement, they are at least developer focused API docs.
    4. As a developer, Dockerfile transpilation cannot be checked at compile time nor unit tested. LLB is a Go library so we get compile time checks, linting, and unit testing.
    5. The dockerfile2llb library on which we currently depend to turn Dockerfile into LLB is very large, difficult to grok, and changes frequently.
    6. The dockerfile2llb library uses an external helper image docker-copy to perform edge case Dockerfile COPY/ADD functionality (archive extraction, e.g.) which we don't utilize. There have been a number of issues with this helper image related to:
      1. Security (see T321316: Self-build and publish buildkit helper images)
      2. Cross-arch compatibility (see T318866: "qemu: uncaught target signal 11" building local dev container on M1 Mac with Docker Desktop)
  2. LLB is more flexible and capable than Dockerfile, and LLB features from which we would benefit are developed more quickly than their Dockerfile counterparts.
    1. Its graph structure can represent clear chains of dependency between operations with greater complexity and clarity than Dockerfile (while Dockerfile does become a graph, it's difficult to reason about how that graph is constructed)
    2. Caching key computation can be control at a precise level for each operation.
    3. File operations (directory/file creation, copying, etc.) can be achieved through the LLB API without use of external programs like the dockerfile-copy helper image. (Direct file creation, not copying, is not even possible with Dockerfile).
    4. DiffOp and MergeOp can be used in conjunction to create very minimal image layers that can in certain cases survive cache invalidation of ancestor layers.

This large of a refactor will require extensive testing, so let's first merge the changes to a protected experimental branch, so we can publish an image (tagged experimental or something) and have people try it out.

Event Timeline

An experimental native-LLB build of Blubber's BuildKit frontend has been published as docker-registry.wikimedia.org/repos/releng/blubber/buildkit:experimental-native-llb.