Page MenuHomePhabricator

ECONNREFUSED error when running Selenium tests on M1 Mac
Open, Needs TriagePublicBUG REPORT

Description

What happens?:

I used what I believe to be the latest fresh:

# fresh: 22.05.1
# image: docker-registry.wikimedia.org/releng/node14-test-browser:0.0.2-s4
# software: Debian GNU/Linux 11 (bullseye)
#           Node.js v14.17.5 (npm 7.21.0)
#           Chromium 97.0.4692.99
#           Mozilla Firefox 91.5.0esr
#           JSDuck 5.3.4 (Ruby 2.7.4) ruby 2.7.4p191
# mount: /mediawiki      ➟ /Users/montehurd/mediawiki-test/mediawiki      (read-write)
#        /mediawiki/.git ➟ /Users/montehurd/mediawiki-test/mediawiki/.git (read-only)

I followed mediawiki setup instructions here:

https://gerrit.wikimedia.org/g/mediawiki/core/+/HEAD/DEVELOPERS.md

Running npm run selenium-test, I see the following:

nobody@docker-desktop:/mediawiki$ npm run selenium-test

> selenium-test
> wdio ./tests/selenium/wdio.conf.js


Execution of 5 workers started at 2022-05-20T22:15:19.540Z

[0-2] RUNNING in chrome - /tests/selenium/specs/user.js
[0-1] RUNNING in chrome - /tests/selenium/specs/recentchanges.js
[0-0] RUNNING in chrome - /tests/selenium/specs/page.js
[0-3] RUNNING in chrome - /tests/selenium/specs/watchlist.js
[0-0] 2022-05-20T22:15:59.605Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:59207
[0-0]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-3] 2022-05-20T22:15:59.625Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:56115
[0-3]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-2] 2022-05-20T22:15:59.624Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:56655
[0-2]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-1] 2022-05-20T22:15:59.639Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:60141
[0-1]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-0] RETRYING in chrome - /tests/selenium/specs/page.js
[0-1] RETRYING in chrome - /tests/selenium/specs/recentchanges.js
[0-2] RETRYING in chrome - /tests/selenium/specs/user.js
[0-3] RETRYING in chrome - /tests/selenium/specs/watchlist.js
[0-0] RUNNING in chrome - /tests/selenium/specs/page.js
[0-2] RUNNING in chrome - /tests/selenium/specs/user.js
[0-3] RUNNING in chrome - /tests/selenium/specs/watchlist.js
[0-1] RUNNING in chrome - /tests/selenium/specs/recentchanges.js
[0-0] 2022-05-20T22:16:38.766Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:58619
[0-0]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-2] 2022-05-20T22:16:38.914Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:56295
[0-2]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-3] 2022-05-20T22:16:38.918Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:55895
[0-3]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-0] FAILED in chrome - /tests/selenium/specs/page.js (1 retries)
[0-1] 2022-05-20T22:16:39.065Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:58169
[0-1]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-2] FAILED in chrome - /tests/selenium/specs/user.js (1 retries)
[0-3] FAILED in chrome - /tests/selenium/specs/watchlist.js (1 retries)
[0-1] FAILED in chrome - /tests/selenium/specs/recentchanges.js (1 retries)
[0-4] RUNNING in chrome - /tests/selenium/wdio-mediawiki/specs/BlankPage.js
[0-4] 2022-05-20T22:17:10.637Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:58063
[0-4]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-4] RETRYING in chrome - /tests/selenium/wdio-mediawiki/specs/BlankPage.js
[0-4] RUNNING in chrome - /tests/selenium/wdio-mediawiki/specs/BlankPage.js
[0-4] 2022-05-20T22:17:42.220Z ERROR @wdio/runner: Error: connect ECONNREFUSED 127.0.0.1:57439
[0-4]     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1148:16)
[0-4] FAILED in chrome - /tests/selenium/wdio-mediawiki/specs/BlankPage.js (1 retries)

Spec Files:	 0 passed, 5 retries, 5 failed, 5 total (100% completed) in 00:02:22

What should have happened instead?:

@zeljkofilipin Did the same steps on his Intel Mac and everything works ok.

I suspect there's some chromium configuration which isn't playing nice with arm64 hosts. If I understand correctly, since it doesn't look like we're building a fresh image for arm64, the amd64 image is used. My inclination was to see about building a fresh image for arm64, but I haven't tracked down specifics for how to go about this with fresh.

I know headless chrome can play nice on M1 from using browserless-chrome, which has an arm64 build:

https://hub.docker.com/r/browserless/chrome/tags

I'm using it elsewhere (though I do have an unrelated bug using it). Here's some info on what they use spinning up headless chrome:

https://docs.browserless.io/blog/2018/06/04/puppeteer-best-practices.html#7-use-docker-to-contain-it-all

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

  • MacOS Monterey 12.3.1
  • Apple M1
  • Docker Desktop 4.8.2, Engine 20.10.14, Compose 2.5.1

Any help is appreciated. I may be missing something glaringly obvious...


Upstream issue https://github.com/docker/for-mac/issues/5766 got declined

Closing as documented. https://docs.docker.com/docker-for-mac/apple-silicon/#known-issues

We don't have control over qemu and it's "best effort" only. We have documented both the occasional crashes and the lack of inotify support.

https://docs.docker.com/desktop/troubleshoot/known-issues/ and hitting For Mac with Apple Silicon has a list of issues including:

Some images do not support the ARM64 architecture. You can add --platform linux/amd64 to run (or build) an Intel image using emulation.
However, attempts to run Intel-based containers on Apple silicon machines under emulation can crash as qemu sometimes fails to run the container. In addition, filesystem change notification APIs (inotify) do not work under qemu emulation. Even when the containers do run correctly under emulation, they will be slower and use more memory than the native equivalent.
In summary, running Intel-based containers on Arm-based machines should be regarded as "best effort" only. We recommend running arm64 containers on Apple silicon machines whenever possible, and encouraging container authors to produce arm64, or multi-arch, versions of their containers.

There is also https://gitlab.com/qemu-project/qemu/-/issues/324 chrome based apps can not be run under qemu user mode which refers to a patch which theoretically can be applied to the QEmu Debian package to see whether it improves things and then find a way to commit time to review/test it in order to get the patch merged by upstream. That needs some commitment and knowledge about system calls / OS development.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mhurd updated the task description. (Show Details)

@Mhurd Are you using the propietary "Docker Desktop for Mac" app to host your Linux VM in which the containers run, or are you using a different docker-machine server? If the former, can you confirm that you've installed anew (not carried over from OS upgrade) the Apple Silicon version from https://docs.docker.com/desktop/mac/install/, including the rosetta command which installs the transparent layer for running Intel applications?

From their manual (link): Another thing that might work is to run export DOCKER_DEFAULT_PLATFORM=linux/amd64 on your outer shell before the fresh-node command. If that works, we can add that to Fresh. For reasons I don't understand, upstream Docker is refusing to let macOS engage Rosetta unless you specifically set this.

Hi, I've been using M1 arm64 for a while and the workaround --platform linux/amd64 almost never works (at least for me) I think we need to have arm64 builds. I've rebuilt my projects that used Chrome and Firefox to build both amd/arm containers.

In a nut shell, Docker Desktop for Mac uses Qemu to run the Linux VM, which either inherrently or due to the specific way that Docker uses it, makes inotify not work when emulated. Chromium requires this during startup and thus refuses to start. This isn't configurable as far as I know, and affects all uses of Chromium inside Docker on Apple M1, so long as the container is run under emulation. There are dozens of upstream bug reports and other projects affected by this as well. In addition to that, Chromium also makes use of zygote which causes Qemu to crash.

See also:

In general, Chromium works on Apple M1 when run directly by macOS (transparently via Rosetta), e.g. when using Google Chrome as desktop mac app. It fails only when run with the indirection of Qemu/Linux/Docker.

I know headless chrome can play nice on M1 from using browserless-chrome, which has an arm64 build:

Chromium could be compiled directly for ARM-type chips (such as Apple M1) and run fine within Qemu/Docker then. However, as far as I know there are not yet any such official distributions by Google. It is my understanding that the way browserless/chrome works on M1, is by actually using the Microsoft Edge distribution of Chromium (from playwright instead of puppeteer). Unlike Google, Microsoft does support ARM-type processors. Noting that Apple M1 is not the first chipset to use ARM-type processors, there are plenty of PC laptops with ARM-type chips, instead of AMD-type chips like Intel.

My inclination was to see about building a fresh image for arm64 […]

Fresh is mostly just an idea that, concretely, is merely a 10-line shell alias to run docker run --rm --interactive --tty --entrypoint /bin/bash docker-registry.wikimedia.org/releng/node12-test-browser. Its main purpose is to resemble and reflect CI and does so by literally using the same container, and CI in turn intends to generally share the base images and packages with production.

To create a native arm64 Linux base image, we'd need a lot more than Chromium.

See also:

@Krinkle Thanks for the great feedback!

Are you using the propietary "Docker Desktop for Mac"

Yes

can you confirm that you've installed anew (not carried over from OS upgrade) the Apple Silicon version from https://docs.docker.com/desktop/mac/install/, including the rosetta command

Yes, but just to be extra sure I re-installed Docker Desktop and re-ran the Rosetta command

To create a native arm64 Linux base image, we'd need a lot more than Chromium.

I see, and that makes sense.


It may be helpful to explain what I'm trying to do...

I have a repo here with a makefile which lets you spin up mediawiki from scratch with basically a single command. Keep in mind I'm still relatively novice with Docker, but so far this has been a really fun learning experience.

My next goal was to add make commands for running tests, which was easy for parser and unit tests ( make runparsertests, make runphpunittests... forgive the all lower case, will tweak this in the future... ) They worked as expected.

So next I wanted to tackle selenium tests. Because I couldn't seem to get the tests to work with fresh, I tried using a browserless chrome container as seen is my WIP selenium branch, but even though the tests begin, and can actually be seen running in a chrome window served up by the browserless chrome container, the tests get stuck for some reason after a little bit.

My suspicion is the problem is related to how I'm configuring browserless, and it's maybe spawning too many sessions, so when a test causes a new page to be loaded, browserless isn't just re-using the session in the same way it would if you were running the tests outside of a dockerized environment.

While debugging this we thought it would be instructive to circle back and see if running the tests hangs in the *same way* on my machine when using fresh, or if they actually work but I had just been using fresh incorrectly to run the selenium tests, but they seem to be hanging more immediately when run via fresh...

That was a lot of text :)

I'm still digesting your second comment...

One thing I did notice in fresh which I wasn't sure how to interpret was...

nobody@docker-desktop:/mediawiki$ chromium --version
Error: Can't open display: 
nobody@docker-desktop:/mediawiki$ /usr/lib/chromium/chromium --version
Chromium 97.0.4692.99 
nobody@docker-desktop:/mediawiki$

Edit: perhaps it's related to this this:

And to avoid running into zombie processes (which commonly happen with Chrome), you'll want to use something like dumb-init to properly start-up:

ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init

I'm pretty out of my depth here haha.

I still think the problem is a mismatch between arm/amd containers. I remember I had the same problem when I tried to run Chrome in AMD containers on ARM. I think that the nodeX-test-browser needs to be built for ARM and then Fresh too.

I was able to hack Fresh so it worked for me. Changing the base image to a version that was built for linux/arm64 and then removed the --platform linux/amd64 in Fresh makes Selenium/Chromium work.

What's strange though is that I have other images where setting the platform to linux/amd64 make Chrome work but that has been on Ubuntu based images.

With the help from @hashar I finally understand what's going on. Somehow chromium --headless do not work anymore without a display in our container (I also tried the new --headless=new flag). When we try to run the tests without an exported display, our webdriver.io config adds --headless but that gives us that error. If I manually starts xvfb and export the display it works. What's strange is that this then only happens on a Mac M1 when you force to run the image as amd.

I did try a couple of other things: I rollbacked the docker container to the version that uses Chrome 103 and then headless works. Trying to build a minimal test case with Dockerfile:

FROM debian:11
ARG TARGETPLATFORM=linux/amd64
RUN apt-get update && apt-get install chromium -y

And then build it: docker buildx build --load --platform linux/amd64 -t debian/debianwithchromium .

And then /running in my ARM machine with : docker run -it --platform linux/amd64 --entrypoint bash

I get:

root@801e3447542f:/# chromium --version
The hardware on this system lacks support for the sse3 instruction set.
The upstream chromium project no longer supports this configuration.
For more information, please read and possibly provide input to their
bug tracking system at http://crbug.com/112335

Also I missed to upgrade to latest Fresh on my machine when tried out xvfb. If you use latest, start xvfb my output is:

nobody@17aef58260e9:/$ chromium --version
Warning: Missing charsets in String to FontSet conversion

And the chromium just hangs.

root@801e3447542f:/# chromium --version
The hardware on this system lacks support for the sse3 instruction set.
The upstream chromium project no longer supports this configuration.
For more information, please read and possibly provide input to their
bug tracking system at http://crbug.com/112335

The proper bug link is http://crbug.com/1123353 which is Chromium requiring the SSE3 instructions set since Chromium 89. To catch that up ahead of starting Chromium, the Debian shell script starting Chromium got adjusted ( https://salsa.debian.org/chromium-team/chromium/-/commit/836b9da55c776a27d884d0405f385dcb7ef6f12e ):

case `uname -m` in
    i386|i586|i686|x86_64)
        # Check whether this system supports SSE3 (aka PNI)
        if ! grep -q 'sse3\|pni' /proc/cpuinfo; then
            xmessage "$nosse3"
            exit 1
        fi
        ;;
esac

uname -m ends up being the hardware machine type as compiled in by the Linux kernel which gives x86_64 and is I guess "normal". Then either Apple Rosetta or Qemu is not exposing the ss3 instructions set since it is not found in /proc/cpuinfo. Then that code path is working properly in your second output (using latest Fresh) which puzzles me :)