Page MenuHomePhabricator

https://gitlab.wikimedia.org/repos/test-platform/catalyst/ci-charts buildkit failure
Closed, ResolvedPublic

Description

From https://gitlab.wikimedia.org/repos/test-platform/catalyst/ci-charts/-/jobs/446970:

#1 resolve image config for docker-image://docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v1.0.1
#1 DONE 0.1s
#2 docker-image://docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v1.0.1@sha256:924e44fe234ba287c36d2bb51529af2cfdfd88ecf7725561c7548eef0d18b75c
#2 resolve docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v1.0.1@sha256:924e44fe234ba287c36d2bb51529af2cfdfd88ecf7725561c7548eef0d18b75c done
#2 CACHED
#3 [internal] load build definition from .pipeline/blubber.yaml
#3 transferring dockerfile: 555B done
#3 DONE 0.0s
error: failed to solve: exit code: 2

And the corresponding buildkit log on the runner:

time="2025-02-20T15:52:36Z" level=info msg="trying next host" error="failed to do request: Head \"https://registry-1.docker.io/v2/library/node/manifests/18-buster\": dial tcp 44.208.254.194:443: connect: connection refused" host=registry-1.docker.io
time="2025-02-20T15:52:36Z" level=info msg="trying next host" error="failed to do request: Head \"https://docker-registry.wikimedia.org/v2/dev/buster-php74-fpm/manifests/1.0.0-s7\": context canceled" host=docker-registry.wikimedia.org
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xaff253]

goroutine 1 [running]:
github.com/moby/buildkit/frontend/dockerui.(*ResultBuilder).Finalize(0x0)
        /go/pkg/mod/github.com/moby/buildkit@v0.14.1/frontend/dockerui/build.go:110 +0x13
gitlab.wikimedia.org/repos/releng/blubber/buildkit.Build({0xe8bbb8, 0xc0000e27d0}, {0xe8f048?, 0xc000638300})
        /srv/app/buildkit/build.go:202 +0x7ab
github.com/moby/buildkit/frontend/gateway/grpcclient.(*grpcClient).Run(0xc000638300, {0xe8bbb8?, 0xc0000e27d0}, 0xdb41a8)
        /go/pkg/mod/github.com/moby/buildkit@v0.14.1/frontend/gateway/grpcclient/client.go:218 +0x1ae
github.com/moby/buildkit/frontend/gateway/grpcclient.RunFromEnvironment({0xe8bbb8, 0xc0000e27d0}, 0x0?)
        /go/pkg/mod/github.com/moby/buildkit@v0.14.1/frontend/gateway/grpcclient/client.go:107 +0x67
main.main()
        /srv/app/cmd/blubber-buildkit/main.go:35 +0xf1
time="2025-02-20T15:52:36Z" level=error msg="/moby.buildkit.v1.frontend.LLBBridge/Solve returned error: rpc error: code = Unknown desc = exit code: 2"
time="2025-02-20T15:52:36Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = exit code: 2"

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
buildkit: Update BuildKit dependency to v0.20repos/releng/blubber!120dduvallreview/buildkit-v0.20-3a8fmain
Customize query in GitLab

Event Timeline

So it looks like buildkitd is trying to access https://registry-1.docker.io directly rather than honoring the proxy settings that are passed in from the client via '--opt', 'build-arg:http_proxy=http://webproxy:8080', '--opt', 'build-arg:https_proxy=http://webproxy:8080'. This is probably a long-standing problem that wasn't noticed before because most/all trusted runner build jobs reference images at docker-registry.wikimedia.org by policy.

@jnuche If you feel up to it, you might retry this experiment to see if buildkit/buildctl 0.20.0 has better error reporting for this situation.

@dancy tried again with this job: https://gitlab.wikimedia.org/repos/test-platform/catalyst/ci-charts/-/jobs/447546

Looks like the log entries are similar:

time="2025-02-21T16:57:34Z" level=info msg="trying next host" error="failed to do request: Head \"https://registry-1.docker.io/v2/library/node/manifests/18-buster\": dial tcp 44.208.254.194:443: connect: connection refused" host=registry-1.docker.io
time="2025-02-21T16:57:34Z" level=info msg="trying next host" error="failed to do request: Head \"https://docker-registry.wikimedia.org/v2/dev/buster-php74-fpm/manifests/1.0.0-s7\": context canceled" host=docker-registry.wikimedia.org
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xaff253]

goroutine 1 [running]:
github.com/moby/buildkit/frontend/dockerui.(*ResultBuilder).Finalize(0x0)
	/go/pkg/mod/github.com/moby/buildkit@v0.14.1/frontend/dockerui/build.go:110 +0x13
gitlab.wikimedia.org/repos/releng/blubber/buildkit.Build({0xe8bbb8, 0xc0000e20a0}, {0xe8f048?, 0xc0001c2c00})
	/srv/app/buildkit/build.go:202 +0x7ab
github.com/moby/buildkit/frontend/gateway/grpcclient.(*grpcClient).Run(0xc0001c2c00, {0xe8bbb8?, 0xc0000e20a0}, 0xdb41a8)
	/go/pkg/mod/github.com/moby/buildkit@v0.14.1/frontend/gateway/grpcclient/client.go:218 +0x1ae
github.com/moby/buildkit/frontend/gateway/grpcclient.RunFromEnvironment({0xe8bbb8, 0xc0000e20a0}, 0x0?)
	/go/pkg/mod/github.com/moby/buildkit@v0.14.1/frontend/gateway/grpcclient/client.go:107 +0x67
main.main()
	/srv/app/cmd/blubber-buildkit/main.go:35 +0xf1
time="2025-02-21T16:57:34Z" level=warning msg="failed to read oom_kill event" error="open /sys/fs/cgroup/buildkit/p0khjgs65deu4sc3csp41ynkq/memory.events: no such file or directory" spanID=b09d899de2b366c7 traceID=170c4e0ee1825e67a39587f977ab6b07
time="2025-02-21T16:57:34Z" level=error msg="/moby.buildkit.v1.frontend.LLBBridge/Solve returned error: rpc error: code = Unknown desc = exit code: 2" spanID=b09d899de2b366c7 traceID=170c4e0ee1825e67a39587f977ab6b07
time="2025-02-21T16:57:34Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = exit code: 2" spanID=56e1c133993b2602 traceID=170c4e0ee1825e67a39587f977ab6b07

This error seems to emanate from Blubber's BuildKit frontend, not buildkitd.

The buildkit dependency is woefully out of date in Blubber, so I will update that to see if that solves the problem.

This error seems to emanate from Blubber's BuildKit frontend, not buildkitd.

The buildkit dependency is woefully out of date in Blubber, so I will update that to see if that solves the problem.

Thanks Dan!

@jnuche This is ready for another round of testing. In this case you'll need to update syntax line in .pipeline/blubber.yaml to # syntax=docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v1.1.0

Used https://gitlab.wikimedia.org/repos/test-platform/catalyst/ci-charts/-/commit/fea7ab56a5336582d581e836bb7f4c271b78c967 to trigger https://gitlab.wikimedia.org/repos/test-platform/catalyst/ci-charts/-/jobs/447546

Then checked buildkitd logs from the runner for that job (gitlab-runner1003.eqiad.wmnet). Strangely enough the last log entry was from a week ago:

time="2025-02-24T15:27:22Z" level=warning msg="exec: \"buildkit-qemu-aarch64\": executable file not found in $PATH" span="[linux/arm64 add-javascript-all-wasm-executor-fresh 12/14] RUN chmod +x /usr/local/bin/install-rust /usr/local/bin/install-wasmedge-quickjs /usr/local/bin/compile-wasm-binary /usr/local/bin/delete-rust-cruft && install-rust && install-wasmedge-quickjs && compile-wasm-binary && delete-rust-cruft" spanID=a549b327d71024ad traceID=8a3d48ba87839dd331e38dce1816d4c5

JFYI I worked around the whole issue by creating the image using docker-pkg. The lack of recent logs on the runner is a bit eyebrow-raising, but from my side there's no need to continue investigating this.

dancy claimed this task.

Thanks for the notes @jnuche. Closing this ticket.