Page MenuHomePhabricator

Multiple extensions' CI blocked again by cache corruption error in mwgate-node16-docker
Closed, DuplicatePublic

Description

Similar to T349986

Example output:

11:55:57 npm ERR! errno FETCH_ERROR
11:55:57 npm ERR! invalid json response body at https://registry.npmjs.org/zwitch reason: Invalid response body while trying to fetch https://registry.npmjs.org/zwitch: ENOENT: no such file or directory, stat '/cache/_cacache/content-v2/sha512/c4/13/4b9344a9b882cbfb7cb906102c2e604f34898d68810f1d66155b0e20d655acb63a5ff56f70e6e3ed49b8e9f386fe3ea02a636dd5a35d53ddaa35816fb787'

See https://integration.wikimedia.org/ci/job/mwgate-node16-docker/90347/console

Event Timeline

Jdforrester-WMF renamed this task from Wikibase CI blocked again by cacache corruption error in mwgate-node16-docker to Multiple extensions' CI blocked again by cache corruption error in mwgate-node16-docker.Nov 29 2023, 3:32 PM

That is the same issue described on T295351 which to me is most probably a race condition specially showing up when there are a lot of patches being merged (and thus updating the cache on the fly). That is the case currently since for the last 3 hours CI has been hammered by a series of changes for T352284. See list at https://gerrit.wikimedia.org/r/q/topic:lsc-T352284

Mentioned in SAL (#wikimedia-releng) [2023-11-29T20:37:41Z] <James_F> Ran jforrester@integration-castor05:~$ sudo rm -fR /srv/castor/castor-mw-ext-and-skins/master/mw*-node16-* for T352305

It was working briefly, but now it's corrupted again it seems. :-(

I think it’s working better now? I haven’t encountered this error today (or, as far as I remember, yesterday).

I think it’s working better now? I haven’t encountered this error today (or, as far as I remember, yesterday).

Yes, so far it seems to have stabilised as working, but it seems that it's just an unfortunate event away from recurring.