Page MenuHomePhabricator

Wikibase npm postinstall script hangs CI forever if Bridge install fails
Closed, DeclinedPublic

Description

In I476dfa4f1c, PS3, I committed to the Data Bridge package-lock.json a GitHub reference (i. e., download this package from a GitHub branch instead of npmjs.com) with a stale commit hash (I had pushed a new commit for the pull request in question in the meantime). Very recent npm versions (e. g. 6.13.6) can apparently install this (as long as the commit hasn’t been garbage collected on GitHub, presumably), but the version used in CI can’t:

npm ERR! code 128
npm ERR! Command failed: git checkout fec07b3c5c0dd43406441afb45901247d48f2ccc
npm ERR! fatal: reference is not a tree: fec07b3c5c0dd43406441afb45901247d48f2ccc
npm ERR!

So far, that’s not too bad: my fault for putting a bad hash in the package-lock.json. (I fixed this in PS4.) However, it also caused two CI jobs, quibble-vendor-selenium-docker #13325 and wmf-quibble-selenium-php72-docker #42961, to be stuck, apparently forever; after printing the following, they continued running for over forty minutes but didn’t do anything more until I aborted them:

15:29:44 > core-js@3.6.4 postinstall /workspace/src/extensions/Wikibase/view/lib/wikibase-tainted-ref/node_modules/@storybook/addon-a11y/node_modules/core-js
15:29:44 > node -e "try{require('./postinstall')}catch(e){}"
15:29:44 
15:29:44 
15:29:44 > tainted-ref@0.1.0 prepare /workspace/src/extensions/Wikibase/view/lib/wikibase-tainted-ref
15:29:44 > node build/wikimedia-ui-base.js
15:29:44 
15:29:45 added 2995 packages in 33.329s

The data-bridge install error appears a bit further up, as an unhandled promise rejection:

15:29:13 Unhandled rejection Error: Command failed: /usr/bin/git checkout fec07b3c5c0dd43406441afb45901247d48f2ccc
15:29:13 fatal: reference is not a tree: fec07b3c5c0dd43406441afb45901247d48f2ccc
15:29:13 
15:29:13     at ChildProcess.exithandler (child_process.js:294:12)
15:29:13     at ChildProcess.emit (events.js:189:13)
15:29:13     at maybeClose (internal/child_process.js:970:16)
15:29:13     at Process.ChildProcess._handle.onexit (internal/child_process.js:259:5)

The command flow appears to be:

  • CI runs npm ci in Wikibase
  • npm installs dependencies
  • npm runs Wikibase’s postinstall command, which is npm-run-all -p install:*
  • npm-run-all runs install:bridge and install:tainted-ref in parallel
  • install:bridge fails with “reference is not a tree”
  • install:tainted-ref finishes
  • For some reason, npm-run-all doesn’t finish

@hashar reports being able to reproduce this on npm 6.5.0, nodejs v10.19.0, after cd Wikibase && git-review -d 599902,3 && npm install; on the other hand, I can’t reproduce this on npm 6.13.4, nodejs v10.19.0 (same node, but nvm gave me a newer npm apparently) – on my system, the install fails quickly instead of hanging forever.

It’s also curious that the mwgate-node10-docker build for the same change not only didn’t fail, it succeeded, and with nothing suspicious in the console as far as I can see, even though it also reports npm 6.5.0.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Lucas_Werkmeister_WMDE triaged this task as Lowest priority.

This probably isn’t worth spending more time on, I mainly wrote this down to give others a clue in case they run into a similar issue. The fix is clear enough: make sure the commit hashes in your lockfiles are up-to-date, or more generally, ensure that npm install in one of the subprojects doesn’t fail. It would of course be better if the overall install didn’t hang forever due to this problem, but I don’t think it’s that important.