Page MenuHomePhabricator

"qemu: uncaught target signal 11" building local dev container on M1 Mac with Docker Desktop
Closed, ResolvedPublicBUG REPORT

Description

Found by @KBach while testing https://gerrit.wikimedia.org/r/c/wikimedia/developer-portal/+/835289.

=> [ 7/10] COPY --chown=501:20 [pyproject.toml, poetry.lock, ./]
=> => # qemu: uncaught target signal 11 (Segmentation fault) - core dumped

Similar to https://github.com/docker/for-mac/issues/5123 this seems likely to be a qemu bug that is incidentally triggered by Docker when emulating amd64 on arm64 hosts.

Event Timeline

[17:01]  <    bd808> dduvall: I have a "fun" blubber-buildkit bug. https://gerrit.wikimedia.org/r/c/wikimedia/developer-portal/+/835289 causes a "qemu: uncaught target signal 11 (Segmentation fault) - core dumped" crash during a COPY instruction on two different M1 Macs. The static Dockerfile removed by the same patch works fine on those exact same macs.
[17:02]  <    bd808> dduvall: any idea where/how I should try to figure out WTF is causing that?
[17:07]  <    dancy> Sounds like you found a bug in qemu
[17:07]  <  dduvall> bd808: oh that does sound "fun"
[17:08]  <   hashar> we got a Phabricator tag dedicated to Apple M1 / ARM issues https://phabricator.wikimedia.org/project/view/6112/
[17:08]  <   hashar> and yeah that sounds like a qemu maybe you get a core file to be inspected?
[17:08]  <    bd808> yeah, it seems to be similar to https://github.com/docker/for-mac/issues/5123 which was basically closed as "upstream"
[17:09]  <   hashar> on x86 we had issue with nodejs/qemu as well
[17:10]  <    bd808> emulating amd64 on arm64 is going to keep being a pita
[17:10]  <    dancy> nod
[17:11]  <    bd808> "Please encourage the author of this container to supply an arm64 or multi-arch image, not just an Intel one. Now that M1 is a mainstream platform, we think that most container authors will be keen to do this." -- https://github.com/docker/for-mac/issues/5123#issuecomment-784992589
[17:11] bd808 has seen the wontfix close on this question in phab
[17:12]  <   hashar> and the parent is https://phabricator.wikimedia.org/T272500
[17:12]  <   hashar> which boils down to if we have some of our custom packages required in the images we need to build them for arm as well
[17:12]  <   hashar> so the status quo is requiring x86
[17:13]  <   hashar> anyway it might be worth filing a task for us with the qemu version used and some details 
[17:13]  <   hashar> and if there is a core file maybe a stack trace can be retrieved from it?
[17:14]  <   hashar> or attempt to use a newer qemu if that is at all possible, maybe the issue got fixed
[17:15]  <    bd808> patching Docker Desktop sounds un-fun
[17:15]  <    dancy> haha
[17:15]  <   hashar> https://phabricator.wikimedia.org/T284696 was Qemu preventing us from updating from nodejs 10 to 12 
[17:15]  <   hashar> solved by upgrading Qemu
[17:15]  <   hashar> but yeah sorry for Docker Desktop :-\
[17:26]  <    bd808> Thanks for the ideas about chasing into qemu folks. I'll see if I can make time to poke that deeply. I might also try building an arm64 native blubber-buildkit image just to see if that makes any difference.
[17:54]  <  dduvall> bd808: oh, so we'd need an arm64 blubber-buildkit image?
[17:55]  <  dduvall> that might not be so difficult since it's based on a scratch image
[17:55]  <  dduvall> so no dependency on wmf having an arm64 debian base image or anything like that
[18:11]  < wikibugs> (PS1) Dduvall: [EXPERIMENTAL] Build darwin/arm64 buildkit frontend image [blubber] - https://gerrit.wikimedia.org/r/836264
[18:11]  <  dduvall> bd808: ^
bd808 triaged this task as Medium priority.Sep 28 2022, 8:31 PM

Change 836307 had a related patch set uploaded (by Dduvall; author: Dduvall):

[blubber@master] buildkit: Support builds for specific target platforms

https://gerrit.wikimedia.org/r/836307

Change 836264 had a related patch set uploaded (by Dduvall; author: Dduvall):

[blubber@master] buildkit: Support target platform in Makefile

https://gerrit.wikimedia.org/r/836264

Change 836264 merged by jenkins-bot:

[blubber@master] buildkit: Support target platform in Makefile

https://gerrit.wikimedia.org/r/836264

Change 836307 merged by jenkins-bot:

[blubber@master] buildkit: Support builds for specific target platforms

https://gerrit.wikimedia.org/r/836307

I've released docker-registry.wikimedia.org/wikimedia/blubber-buildkit:v0.11.0 which can build images for different target platforms. However, we don't yet have a linux/arm64 image for blubber-buildkit. That will require some modifications to our pipeline code to build for specific platforms (hopefully we can do multi-platform). I'm going to hack on it today because... it's Friday.

Change 842881 had a related patch set uploaded (by Dduvall; author: Dduvall):

[integration/pipelinelib@master] Support `build.platform` to target specific platforms

https://gerrit.wikimedia.org/r/842881

Reading T321316: Self-build and publish buildkit helper images clued me into the copy action where the crash is happening being very likely done via the docker/dockerfile-copy image which is used as a helper by buildkit.

Reading T321316: Self-build and publish buildkit helper images clued me into the copy action where the crash is happening being very likely done via the docker/dockerfile-copy image which is used as a helper by buildkit.

Just for clarity: It's used as a helper by buildkit's dockerfile frontend, specifically the dockerfile2llb package, to implement Dockerfile ADD and COPY operations.

Once we fully deprecate transcoding to Dockerfile syntax in Blubber (blubberoid and CLI transcoding) and refactor the build instructions to use native LLB operations, Blubber will no longer be dependent on dockerfile2llb and the dockerfile/copy helper image it uses.

dduvall merged https://gitlab.wikimedia.org/repos/releng/blubber/-/merge_requests/36

ci: Build for both linux/amd64 and linux/arm64

That one got released with Blubber 0.18.0 but reverted in the branch citing: omit linux/arm64 until T322453 is sorted.

Change 988121 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[labs/striker@master] dev(docker): Update blubber buildkit to support Apple Silicon

https://gerrit.wikimedia.org/r/988121

bd808 changed the task status from Open to In Progress.Jan 6 2024, 1:35 AM
bd808 claimed this task.

In my current testing with Striker on an M3 Mac, the newer docker-registry.wikimedia.org/repos/releng/blubber/buildkit:v0.21.1 builder is working without problems. Now to update things to use it!

Change 988121 merged by jenkins-bot:

[labs/striker@master] dev(docker): Update blubber buildkit to support Apple Silicon

https://gerrit.wikimedia.org/r/988121

bd808 edited projects, added Upstream; removed User-bd808, Patch-For-Review.