Page MenuHomePhabricator

Quibble jobs re-download npm packages every build (Castor not loading?)
Closed, ResolvedPublic

Description

It seems there might be an issue with Castor for the (new) quibble-selenium jobs (ref. T232759).

Defined: CASTOR_NAMESPACE="mediawiki-core/master/wmf-quibble-selenium-php72-docker"
00.584 Syncing...
rsync: failed to set times on "/cache/.": Operation not permitted (1)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [generator=3.1.2]

From what I know, this error is not actually a problem (I've seen it hundreds of times). It could be hidden if we bothered, but it just means the cache was empty. This means that nothing is saving data to the cache at the end of jobs. And therefore it is always empty when this job starts.

The result is that it is re-downloading wdio, grunt, chromedriver etc, every build.

Event Timeline

Krinkle added a subscriber: hashar.

Adding to T225730 because this job (together with the phpunit wmf-quibble job) are the two slowest jobs in the mediawiki test pipeline (both about ~ 12 min). Any improvement to them, such as by caching npm dependencies, would directly improve turnaround time for that pipeline.

Looks like the central store has some content:

integration-castor03:/srv/jenkins-workspace/caches$ du -m -d1 mediawiki-core/master/wmf-quibble-selenium-php72-docker
62	mediawiki-core/master/wmf-quibble-selenium-php72-docker/composer
175	mediawiki-core/master/wmf-quibble-selenium-php72-docker/npm
236	mediawiki-core/master/wmf-quibble-selenium-php72-docker

And from a recent build, it does take some time to transfer files, so at least there is definitely something available to the jobs:

00:00:00.599 Defined: CASTOR_NAMESPACE="castor-mw-ext-and-skins/master/wmf-quibble-selenium-php72-docker"
00:00:00.599 Syncing...
00:00:00.931 rsync: failed to set times on "/cache/.": Operation not permitted (1)
00:00:30.955 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1668) [generator=3.1.2]
00:00:30.956 
00:00:30.956 Done

The job uses vendor.git and install php dev dependencies which are loaded from cache:

00:01:19.145 INFO:quibble.commands:vendor.git used. Requiring composer dev dependencies
...
00:01:23.862   - Installing giorgiosironi/eris (0.10.0): Loading from cache
00:01:23.862  Extracting archive  - Installing psr/cache (1.0.1): Loading from cache
00:01:23.914  Extracting archive  - Installing cache/tag-interop (1.0.0): Loading from cache
00:01:23.958  Extracting archive  - Installing cache/integration-tests (0.16.0): Loading from cache
00:01:24.003  Extracting archive  - Installing seld/jsonlint (1.7.1): Loading from cache
...

So the cache is most probably there.

For npm, we serially run npm install in each of the extensions having the selenium-test run script, and some are slow:

MediaWiki core

00:01:45.340 added 1238 packages from 1489 contributors and audited 7530 packages in 17.114s

Wikibase

00:01:53.857 INFO:quibble.commands:Running webdriver test in /workspace/src/extensions/Wikibase
00:02:34.240 added 2527 packages in 29.212s

00:02:34.732 > Wikibase@0.1.0 install:tainted-ref /workspace/src/extensions/Wikibase
00:02:34.732 > npm --prefix view/lib/wikibase-tainted-ref ci

00:03:03.238 added 2483 packages in 27.656s
00:03:03.788 added 710 packages from 818 contributors and audited 2755 packages in 68.945s

Echo

00:08:13.205 added 1895 packages from 2065 contributors and audited 30549 packages in 43.659s


I am not even sure what is stored in the cache with npm5, maybe binary/native modules are now stored.

Running npm install for Wikibase, the CPU skyrocket and the installation takes time on each of:

> core-js@3.2.1 postinstall xxx
> node scripts/postinstall || echo "ignore"

Even on a second installation (which should thus benefit from a warm cache), it takes a minute:

real	1m2,969s
user	0m52,410s
sys	0m8,571s

I guess due to:

client/data-bridge/package.json: "core-js": "^2.6.5",
view/lib/wikibase-tainted-ref/package.json: "core-js": "^2.6.5",

I vaguely remember us having switched to using npm ci instead of npm-install. Did we lose that? That should make a noticeable difference.

The node10 container does run npm ci. Quibble does npm install and uses --prefer-offline since July 25th (70447e232890adfeecce0bf9bffc381b73755285)

Probably we can drop that prefer offline and just switch to just npm ci?

There is also the incredibly slows:

> core-js@3.2.1 postinstall xxx
> node scripts/postinstall || echo "ignore"

I haven't tracked down those though :-\

I've worked with upstream to make it store the chromedriver binary in XDG_CACHE_HOME instead if in a local temp directory. This way, Castor should be able to persist it between CI runs.

https://github.com/giggio/node-chromedriver/pull/232

This has been merged and released as node-chromedriver 78.0.1.

It looks like Castor is still not working correctly for npm cache. The main "mediawiki-quibble-vendor-mysql-php72-docker" jobs still seem to be re-installing dependencies on every build (instead of starting with the result of the last gate pipeline, or however it is supposed to work).

Krinkle renamed this task from quibble-selenium jobs re-downloading npm packages (castor not loading) to Quibble jobs re-download npm packages every build (Castor not loading?).Mar 25 2020, 11:41 PM

Change 583736 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/quibble@master] Prefer 'npm ci' instead of 'npm prune' + 'npm install'

https://gerrit.wikimedia.org/r/583736

Krinkle triaged this task as Medium priority.

Change 583736 merged by jenkins-bot:
[integration/quibble@master] Prefer 'npm ci' instead of 'npm prune' + 'npm install'

https://gerrit.wikimedia.org/r/583736

Keeping this open until the next release/deploy of Quibble.

Change 587580 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/quibble@master] Release Quibble 0.0.41

https://gerrit.wikimedia.org/r/587580

Change 587580 merged by jenkins-bot:
[integration/quibble@master] Release Quibble 0.0.41

https://gerrit.wikimedia.org/r/587580

Change 587588 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] dockerfiles: [quibble-stretch] Install quibble 0.0.41 and cascade.

https://gerrit.wikimedia.org/r/587588

Change 587589 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] jjb: [quibble*] Switch to quibble 0.0.41 images

https://gerrit.wikimedia.org/r/587589

Change 587588 merged by jenkins-bot:
[integration/config@master] dockerfiles: [quibble-stretch] Install quibble 0.0.41 and cascade.

https://gerrit.wikimedia.org/r/587588

Change 587589 merged by jenkins-bot:
[integration/config@master] jjb: [quibble*] Switch to quibble 0.0.41 images

https://gerrit.wikimedia.org/r/587589

This should now be better. Good enough to Resolve?

https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php72-docker/17940/consoleFull

Before (22s for preinstall+install+postinstall+audit)
INFO:quibble.cmd:>>> Start: npm install in /workspace/src

> fibers_node_v8@3.1.5 preinstall node_modules/fibers_node_v8
> node preinstall.js
> fibers@4.0.2 install node_modules/fibers
> node build.js || nodejs build.js
`linux-x64-64-glibc` exists; testing
Binary is fine; exiting

> fibers_node_v8@3.1.5 install node_modules/fibers_node_v8
> node build.js
ignore install

> puppeteer-core@1.20.0 install node_modules/puppeteer-core
> node install.js

> core-js@3.2.1 postinstall node_modules/core-js
> node scripts/postinstall || echo "ignore"

> ejs@2.7.4 postinstall node_modules/ejs
> node ./postinstall.js

> sauce-connect-launcher@1.3.1 postinstall node_modules/sauce-connect-launcher
> node scripts/install.js || nodejs scripts/install.js

npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@2.1.2 (node_modules/fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@2.1.2: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})

added 1059 packages from 1529 contributors and audited 4356 packages in 15.06s

found 21 low severity vulnerabilities
  run `npm audit fix` to fix them, or `npm audit` for details
npm WARN ws@7.2.1 requires a peer of bufferutil@^4.0.1 but none is installed. You must install peer dependencies yourself.
npm WARN ws@7.2.1 requires a peer of utf-8-validate@^5.0.2 but none is installed. You must install peer dependencies yourself.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@2.1.2 (node_modules/fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@2.1.2: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})
audited 4356 packages in 5.955s
found 21 low severity vulnerabilities
  run `npm audit fix` to fix them, or `npm audit` for details

INFO:quibble.cmd:<<< Finish: npm install in /workspace/src, in 22.800 s

https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php72-docker/17940/consoleFull

After (10s for fetch+postinstall)
INFO:quibble.cmd:>>> Start: npm install in /workspace/src

> ejs@2.7.4 postinstall /workspace/src/node_modules/ejs
> node ./postinstall.js

> fibers@4.0.2 install /workspace/src/node_modules/fibers
> node build.js || nodejs build.js
`linux-x64-64-glibc` exists; testing
Binary is fine; exiting

> fibers_node_v8@3.1.5 preinstall /workspace/src/node_modules/fibers_node_v8
> node preinstall.js
> fibers_node_v8@3.1.5 install /workspace/src/node_modules/fibers_node_v8
> node build.js
ignore install

> sauce-connect-launcher@1.3.1 postinstall /workspace/src/node_modules/sauce-connect-launcher
> node scripts/install.js || nodejs scripts/install.js

> puppeteer-core@1.20.0 install /workspace/src/node_modules/puppeteer-core
> node install.js

> core-js@3.2.1 postinstall /workspace/src/node_modules/core-js
> node scripts/postinstall || echo "ignore"

added 1060 packages in 9.403s
INFO:quibble.cmd:<<< Finish: npm install in /workspace/src, in 10.209 s

This should now be better. Good enough to Resolve?

Yes :)

That is a nice improvement thank you!

Is that the binary compiled modules are now stored in XDG_CACHE_HOME due to npm ci?

That is a nice improvement thank you!

Is that the binary compiled modules are now stored in XDG_CACHE_HOME due to npm ci?

I think so, yes; if it's the wrong platform npm will re-build, I think?