Page MenuHomePhabricator

Wikibase CI failing due to npm cache corruption issue
Closed, ResolvedPublic

Description

npm ERR! path /cache/_cacache/content-v2/sha512/65/9d/07d365d9a1cdde2aecf1e8e522d65a2f55b2845349805f0b2c13b6f26792c6805ab04dfdeacae752681a2eb2e8bb35b57c32150b4a3c018ddbcaaaf9c6eb
npm ERR! errno ENOENT
npm ERR! enoent Invalid response body while trying to fetch https://registry.npmjs.org/y18n: ENOENT: no such file or directory, stat '/cache/_cacache/content-v2/sha512/65/9d/07d365d9a1cdde2aecf1e8e522d65a2f55b2845349805f0b2c13b6f26792c6805ab04dfdeacae752681a2eb2e8bb35b57c32150b4a3c018ddbcaaaf9c6eb'

https://integration.wikimedia.org/ci/job/mwgate-node18/43108/consoleText

Similar to T358312, T349986, T352305, ...

Event Timeline

hashar triaged this task as Unbreak Now! priority.

Hi, I have deleted the faulty cache.

The process is to look up in the Jenkins job console the subpath that is used to restore/store the cache:

Defined: CASTOR_NAMESPACE="castor-mw-ext-and-skins/master/mwgate-node18"

Then on delete it from Castor:

ssh integration-castor05.integration.eqiad1.wikimedia.cloud \
  sudo rm -fR /srv/castor/castor-mw-ext-and-skins/master/mwgate-node18

I guess we should add at least a few WMDE people to the integration WMCS project with the sudo permission so that you can easily nuke faulty caches.

Thanks @hashar, I've requested the addition in T370766.

Do you think it makes sense to add those instructions to https://www.mediawiki.org/wiki/Continuous_integration/Architecture/Troubleshooting? (or some other page)
I'm basically not clear how well established that page is as a "CI troubleshooting cheatsheet" - if one existed, we'd surely be advertising it here at WMDE once our staff gets the permissions for those machines.

Jayve20 changed the task status from Resolved to Declined.Jul 27 2024, 6:48 AM
Jayve20 removed hashar as the assignee of this task.