Page MenuHomePhabricator

mwext-node20-rundoc failing post-merge on some repos
Closed, ResolvedPublicBUG REPORT

Description

Examples:

https://integration.wikimedia.org/ci/job/mwext-node20-rundoc/11258/console (MediaWiki-extensions-CodeMirror)

npm error enoent Invalid response body while trying to fetch https://registry.npmjs.org/@mdn%2fbrowser-compat-data: ENOENT: no such file or directory, stat '/cache/_cacache/content-v2/sha512/d0/7f/1fad72fdd3ac07108190d690ec0ea257affe65eabc11586fb47d3409de767efda7c08d815e5b72867496aff14b0fa226c0d0b25d14e1fef6256f4623ba64'

https://integration.wikimedia.org/ci/job/mwext-node20-rundoc/11255/console (Page-Previews)

npm error enoent Invalid response body while trying to fetch https://registry.npmjs.org/eslint-plugin-es-x: ENOENT: no such file or directory, stat '/cache/_cacache/content-v2/sha512/3e/ce/d44958d0453e0a19c29f934eb86e3a80be4922b3ad4ec122d3b21a5678ee5592a7708bb4fe6e3fc77741fa06a958f7e9933d7bded869a6da20aa351f21df'

I think this is the same issue as T373937: mwext-node18-docs-publish failing post-merge for CodeMirror. It's happened a few times since then as well. Each time it seems manually clearing the cache is the fix.

But why does it keep happening? Is there something we can do to guard against this issue?

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2025-06-10T11:35:49Z] <James_F> jforrester@integration-castor05:/srv/castor$ sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mwext-node20-rundoc/ # T396426

But why does it keep happening? Is there something we can do to guard against this issue?

We're not totally sure; I believe it might be happening when upstream drops the network call in the middle of the fetch, and we cache the malformed item, but we're not totally sure.

That is almost certainly the same issue as T295351: npm cache saved by castor get corrupted for unknown reason which I strongly suspect to be a race condition between:

  • a job fetching the cache
  • another one writing to it and thus erasing some files in flight.

A better system would have to be found than Castor, probably as part of upgrading the infrastructure ( Continuous-Integration-Infrastructure (Zuul upgrade) ).

I am tempted to have this marked as a duplicate of T295351.

Yes, that sounds like the root problem. Feel free to change this task to a duplicate, and also T373937: mwext-node18-docs-publish failing post-merge for CodeMirror