Page MenuHomePhabricator

`npm -s run test:size` is blocking merges to many extensions
Closed, ResolvedPublicBUG REPORT

Description

The Vector npm run test:size job appears to be run separately now by Quibble but runs with the incorrect path (//load.php rather than w/load.php) is now blocking WikimediaEvents.

https://gerrit.wikimedia.org/r/q/project:mediawiki%252Fextensions%252FWikimediaEvents+status:open

This is blocking some key instrumentation that needs to be merged (and possibly backported this week) prior to the English Wikipedia communities decision to enable talk pages for anonymous users.

What happens?:

> selenium-test
16:35:51 > npm -s run test:size
16:35:51 
16:35:51 INFO:backend.PhpWebserver:[Tue Nov  9 00:35:52 2021] 127.0.0.1:44712 [200]: //load.php?lang=en&modules=skins.vector.styles.legacy
16:35:52 events.js:174
16:35:53       throw er; // Unhandled 'error' event
16:35:53       ^
16:35:53 
16:35:53 Error: spawn node_modules/.bin/bundlesize-pipe ENOENT
16:35:53     at Process.ChildProcess._handle.onexit (internal/child_process.js:240:19)
16:35:53     at onErrorNT (internal/child_process.js:415:16)
16:35:53     at process._tickCallback (internal/process/next_tick.js:63:19)
16:35:53 Emitted 'error' event at:
16:35:53     at Process.ChildProcess._handle.onexit (internal/child_process.js:246:12)
16:35:53     at onErrorNT (internal/child_process.js:415:16)
16:35:53     at process._tickCallback (internal/process/next_tick.js:63:19)
16:35:53 INFO:quibble.commands:<<< Finish: Browser tests for projects mediawiki/extensions/WikimediaEvents, mediawiki/core, mediawiki/extensions/EventBus, mediawiki/extensions/EventLogging, mediawiki/extensions/EventStreamConfig, mediawiki/skins/Vector, mediawiki/vendor, in 98.259 s
16:35:53 INFO:backend.ChromeWebDriver:Terminating ChromeWebDriver
16:35:53 INFO:backend.Xvfb:Terminating Xvfb
16:35:53 INFO:backend.PhpWebserver:Terminating PhpWebserver
16:35:53 INFO:backend.MySQL:Terminating MySQL
16:35:53 Traceback (most recent call last):

What should have happened instead?:

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc:

Event Timeline

Reedy renamed this task from `npm -s run test:size` is blocking WikimediaEvents merges to `npm -s run test:size` is blocking WikimediaEvents merges.Nov 9 2021, 2:23 AM

The Vector npm run test:size job appears to be run separately now by Quibble […]

What do you mean by "separately"? Separately from what?

The phrase "now" suggests something has changed. What do you think has changed? If you know how this npm script would/should/could be run instead (and possibly how it used to run), that'd be good to share.

I suppose @hashar could double check tomorrow, but I see no recent change in this area in Quibble or CI config.

Comparing to last week's build logs, the command was run the same way as today.
Comparing between Vector and other repos, the command is run the same way on both repos.

Also note you can see that the quibble-selenium job in the Vector repo (which is passing), and the ones that are failing everywhere, are both running the exact same version of Quibble. So the way the command is invoked, and Quibble more generally, are likely not relevant.

both
docker run ... /releng/quibble-buster-php72:1.2.0 ...

[…] but runs with the incorrect path (//load.php rather than w/load.php) is now blocking WikimediaEvents.

Define "incorrect"? What is incorrect about this path? It seems there is an extra slash indeed.

The double initial slash in build output of WMF CI jobs has been that way for the past 8+ years. Iirc, also in the Travis CI jobs when we still ran those. This is likely an artefact of a needless trailing slash somewhere in the many layers of between CI config, Quibble, and mw installer/setup/defaultsetttings.

This hasn't changed, and doesn't matter. E.g. https://en.wikipedia.org//static///images/////project-logos//////enwiki-2x.png is logically equivalent to https://en.wikipedia.org/static/images/project-logos/enwiki-2x.png. This is true for most web servers, especially anything macOS or Linux-based.

I don't know where you see w/load.php. Afaik our CI bulilt-in server configuration has never used a /w directory. You can see, however, that the jobs that are failing today are invoking the Vector script the same way, and are logging the same exact URL.

Vector build 86790 - Passing
> npm -s run test:size
INFO:backend.PhpWebserver:[Thu Nov  4 17:51:49 2021] 127.0.0.1:58962 [200]: //load.php?lang=en&modules=skins.vector.styles.legacy
…
PASS  skins.vector.styles.legacy: 7.59KB < maxSize 7.9KB (gzip)
build 121433 - Failing
> npm -s run test:size
INFO:backend.PhpWebserver:[Tue Nov  9 00:44:07 2021] 127.0.0.1:53092 [200]: //load.php?lang=en&modules=skins.vector.styles.legacy
events.js:174
      throw er; // Unhandled 'error' event
Nikerabbit renamed this task from `npm -s run test:size` is blocking WikimediaEvents merges to `npm -s run test:size` is blocking merges to many extensions.Nov 9 2021, 6:56 AM

Will check. The test:size entrypoint is used in MinervaNeue and Vector skin. The target URL comes from MW_SCRIPT_PATH environment variable:

lang=javascriptname=tests/resource-loader-bundlesize.js
...
    MW_SERVER = process.env.MW_SERVER || 'http://127.0.0.1:8080',
    MW_SCRIPT_PATH = process.env.MW_SCRIPT_PATH || '/w',

First thing is to find a good build and compare it to a wrong build in attempt to find what might have changed. I will do that right now.

From the Translate extension

Good

Around 13:30 UTC
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/736229
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/86652/console

Bad

Around 21:00 UTC
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Translate/+/736013
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/86732/console

The build uses /load.php which is how Quibble exposes MediaWiki in the web service. But the failing build has some issue with the package-lock.json and a cached file that is missing :-\

hashar claimed this task.

That is probably the same issue as T294426 or T293937: the npm cache ends up being corrupted for some reason.

I have dropped the cache on integration-castor03 by moving /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php72-selenium-docker to /srv/corrupted-quibble-vendor-mysql-php72-selenium-docker which can be investigated later. As a teaser if I retrieve it and run npm cache verify:

$ mkdir  corrupted-npm-cache
$ cd corrupted-npm-cache
$ rsync -va -zz integration-castor03.integration.eqiad1.wikimedia.cloud:/srv/corrupted-quibble-vendor-mysql-php72-selenium-docker
$ cd corrupted-quibble-vendor-mysql-php72-selenium-docker/npm
$ npm cache --cache "$(pwd)" verify
Cache verified and compressed (~/corrupted-npm-cache/corrupted-quibble-vendor-mysql-php72-selenium-docker/npm/_cacache)
Content verified: 0 (0 bytes)
Missing content: 6984
Index entries: 0
Finished in 1.613s

It has 0 entries in the index and bunch of missing content. But that is to be investigated later.


All faulty changes can be rechecked again, for example using:

recheck due to npm cache corruption - T295341

Thank you @hashar for making my day a little less stressful by unblocking these <3