Page MenuHomePhabricator

Create a dbt Docker container
Closed, ResolvedPublic

Description

As part of the broader goal of introducing dbt-core into our standard toolbox, we will need a Docker container with a working dbt-core installation for at least two usage scenarios:

  1. Implementation of an Airflow operator that runs dbt-core
  2. Support for GitLab CI/CD testing of dbt projects

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
[T406636] Blubber buildrepos/data-engineering/dbt-jobs!4javiermontonfeature/blubber-poetrymain
Customize query in GitLab

Event Timeline

The Merge Request https://gitlab.wikimedia.org/repos/data-engineering/dbt/-/merge_requests/2 has a blubber file to create a basic dbt docker image. dbt is wrapped in uv, it creates the virtual environment, but we could change it if we need a different approach.

$ docker buildx build -f docker/blubber.yml --target lint --tag dbt-lint --load .

[+] Building 47.1s (14/14) FINISHED                                                    docker:default
 => [internal] load build definition from blubber.yml                                            0.0s
 => => transferring dockerfile: 998B                                                             0.0s
 => resolve image config for docker-image://docker-registry.wikimedia.org/repos/releng/blubber/  0.5s
 => CACHED docker-image://docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v1.5.0@sh  0.0s
 => [lint] resolving image metadata for docker-registry.wikimedia.org/bookworm:20251016          1.5s
 => [internal] load .dockerignore                                                                0.0s
 => => transferring context: 2B                                                                  0.0s
 => [lint] 🌐 docker-registry.wikimedia.org/bookworm:20251016@sha256:c2a7fc13a12c189f42a2c2aa     0.0s
 => => resolve docker-registry.wikimedia.org/bookworm:20251016@sha256:c2a7fc13a12c189f42a2c2aa8  0.0s
 => => sha256:c2a7fc13a12c189f42a2c2aa8774e9390432ec421e42aad35c9976bf208148ac 529B / 529B       0.0s
 => => sha256:9b5ca5a1ed53729dd1abf6585d1b4f11d8e81b537a69a1cf7ec2d44691aecc6b 1.63kB / 1.63kB   0.0s
 => [internal] load build context                                                                1.0s
 => => transferring context: 137.79MB                                                            1.0s
 => [lint] 🖥️ # apt-get update && apt-get install -y "krb5-config" "libkrb5-dev" "curl" "       33.6s
 => [lint] 🖥️ # (getent group "65533" || groupadd -o -g "65533" -r "somebody") && (getent        0.2s  
 => [lint] 🖥️ # (getent group "900" || groupadd -o -g "900" -r "runuser") && (getent passw       0.3s  
 => [lint] 📂 [pyproject.toml uv.lock Makefile packages.yml package-lock.yml profiles.yml .sq     0.0s
 => [lint] 🖥️ @65533 $ make "install"                                                            7.5s  
 => [lint] 📂 [.] -> .                                                                            1.4s
 => exporting to image                                                                           1.8s 
 => => exporting layers                                                                          1.8s 
 => => writing image sha256:255cf52bf127db8747b270e4b9ab96451013c8aec9eb451ab71d5910bed1b9fb     0.0s 
 => => naming to docker.io/library/dbt-lint


$ docker run dbt-lint:latest uv run dbt --version
warning: Ignoring existing virtual environment linked to non-existent Python interpreter: .venv/bin/python3 -> python
Using CPython 3.11.14
Removed virtual environment at: .venv
Creating virtual environment at: .venv
Installed 70 packages in 1.30s
INFO:trino.auth:keyring module not found. OAuth2 token will not be stored in keyring.
INFO:trino.auth:keyring module not found. OAuth2 token will not be stored in keyring.
Core:
  - installed: 1.10.13
  - latest:    1.10.13 - Up to date!

Plugins:
  - trino: 1.9.3 - Up to date!

We have changed the approach, rather than using uv installed manually, we are using the provided docker-registry.wikimedia.org/python3-build-bookworm which includes Python and the option to use Poetry. Now dbt is installed with Poetry. Code in the MR.

$ docker buildx build -f docker/blubber.yml --tag dbt-jobs --load .

[+] Building 1.7s (18/18) FINISHED                                                                                                               docker:default
 => [internal] load build definition from blubber.yml                                                                                                      0.0s
 => => transferring dockerfile: 701B                                                                                                                       0.0s
 => resolve image config for docker-image://docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v1.5.0                                             0.6s
 => CACHED docker-image://docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v1.5.0@sha256:60fec7dd023d6c7be4e318e0224942d424c43b1fc9d5783370e81  0.0s
 => [test] resolving image metadata for docker-registry.wikimedia.org/python3-build-bookworm:latest                                                        0.5s
 => [internal] load .dockerignore                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                            0.0s
 => [test] 🌐 docker-registry.wikimedia.org/python3-build-bookworm:latest@sha256:a9504cf90254b55ef146606afe902693fddfb1f62c32ff73e006df93bdd2629b           0.0s
 => [internal] load build context                                                                                                                          0.0s
 => => transferring context: 198.02kB                                                                                                                      0.0s
 => CACHED [test] 🖥️ # apt-get update && apt-get install -y "git" && rm -rf /var/lib/apt/lists/*                                                           0.0s
 => CACHED [test] 🖥️ # (getent group "65533" || groupadd -o -g "65533" -r "somebody") && (getent passwd "65533" || useradd -l -o -m -d "/home/somebo       0.0s
 => CACHED [test] 🖥️ # (getent group "900" || groupadd -o -g "900" -r "runuser") && (getent passwd "900" || useradd -l -o -m -d "/home/runuser" -r -       0.0s
 => CACHED [test] 📂 [pyproject.toml poetry.lock .sqlfluff] -> ./                                                                                           0.0s
 => CACHED [test] 🖥️ @65533 $ python3 "-m" "venv" "/opt/lib/venv"                                                                                          0.0s
 => CACHED [test] 🖥️ @65533 $ python3 "-m" "pip" "install" "-U" "setuptools!=60.9.0" && python3 "-m" "pip" "install" "-U" "wheel" "tox" "pip"              0.0s
 => CACHED [test] 🖥️ @65533 $ python3 "-m" "pip" "install" "-U" "poetry==2.2.1"                                                                            0.0s
 => CACHED [test] 🖥️ @65533 $ mkdir -p "/opt/lib/poetry"                                                                                                   0.0s
 => CACHED [test] 🖥️ @65533 $ poetry "install" "--no-root"                                                                                                 0.0s
 => [test] 📂 [.] -> .                                                                                                                                      0.1s
 => exporting to image                                                                                                                                     0.1s
 => => exporting layers                                                                                                                                    0.1s
 => => writing image sha256:ce24f7ca32413509985a2c33569e9d8e778111b9459bf04b74061c90ff870d12                                                               0.0s
 => => naming to docker.io/library/dbt-jobs