Page MenuHomePhabricator

Jenkins jobs for npm-test fail on project with deps on node-gyp which requires python2.7
Closed, ResolvedPublic

Description

I've just noticed that selenium-daily-beta-Newsletter is failing at least since December 18 (older jobs are deleted).

Full console output is available at P7997.

Failing jobs:

It might be similar to T210506: `npm install` fails for mediawiki/core with EPEERINVALID when running on Node 11, but I'm not sure.

Some console output that might be relevant.

...
+ node --version
v6.11.0
...
+ npm --version
3.8.3
...
+ npm install --no-progress
...
gyp WARN EACCES user "jenkins-deploy" does not have permission to access the dev dir "/nonexistent/.node-gyp/6.11.0"
gyp WARN EACCES attempting to reinstall using temporary dev dir "/tmp/.node-gyp"
Traceback (most recent call last):
  File "/usr/local/lib/node_modules/npm/node_modules/node-gyp/gyp/gyp_main.py", line 13, in <module>
    import gyp
  File "/usr/local/lib/node_modules/npm/node_modules/node-gyp/gyp/pylib/gyp/__init__.py", line 8, in <module>
    import gyp.input
  File "/usr/local/lib/node_modules/npm/node_modules/node-gyp/gyp/pylib/gyp/input.py", line 5, in <module>
    from compiler.ast import Const
ImportError: No module named compiler.ast
gyp ERR! configure error 
gyp ERR! stack Error: `gyp` failed with exit code: 1
gyp ERR! stack     at ChildProcess.onCpExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:305:16)
gyp ERR! stack     at emitTwo (events.js:106:13)
gyp ERR! stack     at ChildProcess.emit (events.js:191:7)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)
gyp ERR! System Linux 4.9.0-0.bpo.8-amd64
gyp ERR! command "/usr/bin/nodejs" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild" "--release"
gyp ERR! cwd /src/node_modules/fibers
gyp ERR! node -v v6.11.0
gyp ERR! node-gyp -v v3.3.1
gyp ERR! not ok 
node-gyp exited with code: 1
...
npm ERR! Linux 4.9.0-0.bpo.8-amd64
npm ERR! argv "/usr/bin/nodejs" "/usr/local/bin/npm" "install" "--no-progress"
npm ERR! node v6.11.0
npm ERR! npm  v3.8.3
npm ERR! code ELIFECYCLE

npm ERR! fibers@3.1.1 install: `node build.js || nodejs build.js`
npm ERR! Exit status 1
...

Related Objects

StatusSubtypeAssignedTask
StalledNone
ResolvedNone
Resolvedakosiaris
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedReedy
ResolvedReedy
ResolvedBawolff
ResolvedAnomie
ResolvedBawolff
ResolvedBawolff
ResolvedLegoktm
ResolvedLucas_Werkmeister_WMDE
ResolvedBawolff
Resolvedsbassett
Resolvedsbassett
ResolvedJdforrester-WMF
Resolvedsbassett
Resolvedsbassett
ResolvedReedy
ResolvedReedy
ResolvedJdforrester-WMF
ResolvedReedy
ResolvedReedy
ResolvedReedy
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedReedy
ResolvedReedy
ResolvedReedy
ResolvedJdforrester-WMF
Resolvedhashar
Resolvedhashar
ResolvedJdforrester-WMF
Resolvedhashar
DeclinedMoritzMuehlenhoff
Invalidthcipriani
Resolvedโ€ข mmodell
Resolvedhashar
ResolvedJoe
ResolvedJMeybohm
ResolvedJMeybohm
DuplicateDzahn
DeclinedDzahn
ResolvedJdforrester-WMF
ResolvedKrinkle

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptJan 16 2019, 5:02 PM

fibers seems to be a native module which thus requires compilation. That is done using node-gyp which internally relies on python. The actual error:

> fibers@3.1.1 install /src/node_modules/fibers
	> node build.js || nodejs build.js
	
	gyp WARN EACCES user "jenkins-deploy" does not have permission to access the dev dir "/nonexistent/.node-gyp/6.11.0"
	gyp WARN EACCES attempting to reinstall using temporary dev dir "/tmp/.node-gyp"
	Traceback (most recent call last):
	  File "/usr/local/lib/node_modules/npm/node_modules/node-gyp/gyp/gyp_main.py", line 13, in <module>
	    import gyp
	  File "/usr/local/lib/node_modules/npm/node_modules/node-gyp/gyp/pylib/gyp/__init__.py", line 8, in <module>
	    import gyp.input
	  File "/usr/local/lib/node_modules/npm/node_modules/node-gyp/gyp/pylib/gyp/input.py", line 5, in <module>
	    from compiler.ast import Const
	ImportError: No module named compiler.ast
	gyp ERR! configure error 
	gyp ERR! stack Error: `gyp` failed with exit code: 1

The first error is that the process seems to run as user jenkins-deploy which is the Unix user on the host. Running in the container should be using the user nobody.

The other issue is failling to find compiler.ast. It has been removed from python3, so if our container only has python3 that would be the source of error.

Note the Docker containers are being passed the environment from the host. Some variables are manually blacklisted, they are related to the login command. The env to the container with:

docker run ... --env-file <(/usr/bin/env|egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL|HHVM_REPO_CENTRAL_PATH)=')

I ran the container on my machine executing /usr/bin/env as an entrypoint and grepping for my username:

$ docker run --rm -it --entrypoint=/usr/bin/env --env-file <(/usr/bin/env|egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL|HHVM_REPO_CENTRAL_PATH)=')  docker-registry.wikimedia.org/releng/npm-browser-test:0.3.1|grep hashar
USERNAME=hashar
USER=hashar
PWD=/home/hashar
OLDPWD=/home/hashar

Possibly node-gyp relies on USER and is messed up? Or maybe the container does not set USER nobody or maybe the message reported is using $USER when it is actually running as nobody.

Locally, the containerized process is nobody:

$ docker run --rm -it --entrypoint=id --env-file <(/usr/bin/env|egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL|HHVM_REPO_CENTRAL_PATH)=')  docker-registry.wikimedia.org/releng/npm-browser-test:0.3.1
uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup)

:(

To reproduce:

$ docker run --rm -it --entrypoint=bash docker-registry.wikimedia.org/releng/npm-browser-test:0.3.1
$ cd /src
$ git init .
Initialized empty Git repository in /src/.git/
$ git fetch --quiet --depth=1 https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Newsletter
$ git checkout FETCH_HEAD
$ npm install
<error reproduced>

Then to check the installed python* packages:

$ dpkg -l python*|egrep ^ii
ii  python-minimal    2.7.13-2        amd64        minimal subset of the Python language (default version)
ii  python2.7-minimal 2.7.13-2+deb9u2 amd64        Minimal subset of the Python language (version 2.7)

python-minimal is Debian specific, it is python stripped from a lot of various modules and I guess the compiler module is NOT included`. We would want to change the container to install the whole python instead: python-minimal -> python. And that would solve it.

I had spotted that exact same issue while doing T190032

We have python-minimal in a bunch of npm containers:

$ git grep -h python.*minimal dockerfiles/*/Dockerfile.template
# python-minimal for node-gyp
RUN {{ "ruby ruby2.3 ruby2.3-dev rubygems-integration python-minimal build-essential" | apt_install }} \
# python-minimal for node-gyp
RUN {{ "nodejs-legacy python-minimal ruby ruby-dev rubygems-integration build-essential" | apt_install }}
# python-minimal for node-gyp
RUN {{ "nodejs-legacy npm ruby ruby2.1 ruby2.1-dev rubygems-integration python-minimal build-essential" | apt_install }} \
# python-minimal for node-gyp
RUN {{ "nodejs nodejs-legacy ruby ruby2.3 ruby2.3-dev rubygems-integration python-minimal build-essential" | apt_install }} \

python-minimal would need to be replaced by python-2.7. Then we need to create an entry in each debian/changelog files for those four containers AS WELL AS FOR ALL DESCENDENT CONTAINERS (which is the bad part).

For the other issue (wrong username) node-gyp source code has:

./lib/install.js:    log.warn('EACCES', 'user "%s" does not have permission to access the dev dir "%s"', osenv.user(), devDir)

So surely we should unset USER / USERNAME somehow and then we would have:

gyp WARN EACCES user "undefined" does not have permission to access the dev dir "/nonexistent/.node-gyp/6.11.0"

@hashar recommend replacing all python-minimal with python:

integration/config/dockerfiles$ git grep -l python.*minimal */Dockerfile.template
Krinkle renamed this task from selenium-daily-beta-Newsletter failing during `npm install` to Jenkins job for npm-test fail due to node-gyp requiring python2.7.Jan 22 2019, 1:07 AM
Krinkle renamed this task from Jenkins job for npm-test fail due to node-gyp requiring python2.7 to Jenkins jobs for npm-test fail on project with deps on node-gyp which requires python2.7.

I'm marking this as blocker T211784. Various project for which I attempted to upgrade the Jenkins job from node6 (debian-8-jessie) to node10 (debian-9-stretch) fail due to dev dependencies involving node-gyp being unable to compile.

$ npm install
..
> .. node-expat@2.3.17 install
> node-gyp rebuild

Traceback (most recent call last):
  File "/srv/npm/node_modules/node-gyp/gyp/gyp_main.py", line 13, in <module>
    import gyp
  File "/srv/npm/node_modules/node-gyp/gyp/pylib/gyp/__init__.py", line 8, in <module>
    import gyp.input
  File "/srv/npm/node_modules/node-gyp/gyp/pylib/gyp/input.py", line 5, in <module>
    from compiler.ast import Const
ImportError: No module named compiler.ast

Upstream Node.js is well-aware of this being a problem with python2.7 reaching end-of-life after December 31, 2019; but.. as of today it remains a dependency and must be installed.

more info at https://github.com/nodejs/node-gyp/issues/1337.

Krinkle raised the priority of this task from Medium to High.Jan 22 2019, 1:13 AM

So the fix is straightforward: replace python-minimal with python2.7. I did that previously for the exact same issue but solely for releng/java8-xgboost with T190032.

My trouble right now is identifying which images to update, there is no good way I found out to list images affected by a change. Once that is figured out I gotta craft changelog and review modifications introduced by the parent images since the last build that is a bit tedious :-/ Not to mention that other packages eventually get upgraded :-/

There is a similar issue when the base images get upgraded. We do not have a good way to track what has changed. An idea I had a few days ago is to entirely rethink how the containers are build and published. Ie to just blindly rebuild all of them on a daily basis and switch of CI jobs to use :latest. But that will be the subject of another task.

My trouble right now is identifying which images to update, [..]

I've done this recently for T212602, which you may be able to re-use, assuming not much has changed meanwhile.

  • 93937a5c1846: changed dockerfiles/ci-stretch, and bumped node10, node10-test, and node10-test-browser.
  • 9697356ce844: changed jjb to grep for node10-test:{N} and node10-test-browser:{N}, and replace with N+1.
  • use the CI job to see which JJB jobs were affected, and generate those locally with jenkins-job utility and push to Jenkins.

The referenced commits also updated ci-jessie and the old npm/node6 jobs, but I'm recommending we not add python2.7 there.

Mainly because Most jessie/node6 jobs already have py27 (we've been using node-gyp in dev dependencies for years without issue). But also, because if a job is currently missing, that's presumably not worked for at least 1 year, so probably not blocking/high-prio. Once we add py27 for node-gyp to the Node 10 images, these jobs will have a reason to migrate per T211784 - which will reduce maintenance cost for you :)

Change 486195 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Replace python-minimal with python (full) in node10 dockerfiles

https://gerrit.wikimedia.org/r/486195

Change 486195 merged by Krinkle:
[integration/config@master] Replace python-minimal with python (full) in node10 dockerfiles

https://gerrit.wikimedia.org/r/486195

Krinkle claimed this task.
Krinkle added a project: Performance-Team.

Fixed for node10 images. The vast majority of npm3 and npm6 jobs were unaffected and already had the fuller Python version.

If still found on an npm6 job, probably wontfix, in favour of node10. Report at T211784.

Both selenium-daily-beta-Newsletter and selenium-daily-beta-ORES jobs still fail.

integration/config/dockerfiles$ git grep -l python.*minimal */Dockerfile.template
npm-stretch/Dockerfile.template
npm/Dockerfile.template
npm6/Dockerfile.template

'selenium-daily-beta-{project}' template uses docker-registry.wikimedia.org/releng/npm-browser-test:0.3.1. The image has not been updated, should be switched to node10 by using docker-registry.wikimedia.org/releng/node10-test-browser 0.2.3 or later. Can be done as a sub task of T211784 ;)

'selenium-daily-beta-{project}' template uses docker-registry.wikimedia.org/releng/npm-browser-test:0.3.1. The image has not been updated, should be switched to node10 by using docker-registry.wikimedia.org/releng/node10-test-browser 0.2.3 or later. Can be done as a sub task of T211784 ;)

Good point.

I've created this task as selenium-daily-beta-Newsletter failing during npm install and the job is still failing, so I've reopened it. In the meantime, the topic of the task became more general, so the new topic is resolved.

I'll create a new task.

Change 494199 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Default selenium-daily-beta-{project} to node10/npm6

https://gerrit.wikimedia.org/r/494199

Change 494199 merged by jenkins-bot:
[integration/config@master] Default selenium-daily-beta-{project} to node10/npm6

https://gerrit.wikimedia.org/r/494199