Page MenuHomePhabricator

Jenkins jobs for MediaWiki failing with 'npm: shasum check failed'
Closed, ResolvedPublic

Description

https://integration.wikimedia.org/ci/job/mediawiki-quibble-composertest-php70-docker/5461/console

00:01:09.070 npm ERR! node v6.11.0
00:01:09.070 npm ERR! npm  v3.8.3
00:01:09.074 
00:01:09.074 npm ERR! shasum check failed for /tmp/npm-394-6a81022a/registry.npmjs.org/socket.io-client/-/socket.io-client-2.0.4.tgz
00:01:09.075 npm ERR! Expected: 0918a552406dc5e540b380dcd97afc4a64332f8e
00:01:09.075 npm ERR! Actual:   9df1b7fc10b0a92a30aedd327383305d84e9ea06
00:01:09.075 npm ERR! From:     https://registry.npmjs.org/socket.io-client/-/socket.io-client-2.0.4.tgz
00:01:09.076 npm ERR! 
00:01:09.077 npm ERR! If you need help, you may report this error at:
00:01:09.077 npm ERR!     <https://github.com/npm/npm/issues>

Been happening for about 2-3 days now. The error is for a different package each time, so not specific to any individual package I think.

Seems particularly common in gate-and-submit.

Event Timeline

Krinkle created this task.Sep 4 2018, 8:19 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 4 2018, 8:19 PM
Krinkle triaged this task as High priority.Sep 4 2018, 8:20 PM
Krinkle updated the task description. (Show Details)
Krinkle added a comment.EditedSep 5 2018, 7:31 PM

And again https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/9469/console

00:01:23.145 npm ERR! argv "/usr/bin/nodejs" "/usr/local/bin/npm" "install" "--no-progress"
00:01:23.146 npm ERR! node v6.11.0
00:01:23.146 npm ERR! npm  v3.8.3
00:01:23.155 npm ERR! shasum check failed for /tmp/npm-550-af7312e6/registry.npmjs.org/eslint/-/eslint-4.19.1.tgz
00:01:23.155 npm ERR! Expected: 32d1d653e1d90408854bfb296f076ec7e186a300
00:01:23.156 npm ERR! Actual:   55f68fcc5d3ac0193aee91325c4d5c1f444710c6
00:01:23.157 npm ERR! From:     https://registry.npmjs.org/eslint/-/eslint-4.19.1.tgz
00:01:23.157 npm ERR! If you need help, you may report this error at:
00:01:23.158 npm ERR!     <https://github.com/npm/npm/issues>

Related: https://npm.community/t/shasum-check-or-integrity-eintegrity-errors/153

@Legoktm Yeah, I think this is probably due to a cache corruption. Clearing castor for the affected key (repo+branch?) could help, although it's likely to come back eventually. Might be due to aborted builds saving bad data to castor or some other scenario.

Newer npm versions are supposed to be better in that they only put things atomically in the cache after verification, or something.

So clearing castor once and upgrading our npm version might resolve this (v5.x or later, which came with official Node 6).

There were some major changes when we moved from npm v1/v2 to v3, but I'm not aware of any major changes in v5 that would complicate an upgrade, but it's worth looking into.

Krinkle changed the task status from Open to Stalled.Sep 25 2018, 5:49 PM

This is blocked on CI upgrading to npm 5 or later. The cache instability was a known in older version of npm and is (allegedly) been resolved in 5.x and 6.0.

(Still seen at least a dozen times over the past 2 days, probably the most common kind of false negative at the moment.)

hashar changed the task status from Stalled to Open.Dec 11 2018, 4:58 PM
hashar added a subscriber: thcipriani.

shower thoughts:

We have jobs writing to the central cache constantly, potentially a job could download a file as it is being written to. The cached file ends up being corrupted since it is in the process of being overwritten. I would then assume that npm 3.8 ends up failing instead of downloading a fresh copy from npmjs.org

@thcipriani mentioned we might want to pass --delete-updates to rsync.

Change 479558 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[integration/config@master] castor: add --delay-updates to rsync commands

https://gerrit.wikimedia.org/r/479558

(tagging team given a patch is being worked on and in need of review.)

Change 479558 merged by jenkins-bot:
[integration/config@master] castor: add --delay-updates to rsync commands

https://gerrit.wikimedia.org/r/479558

Mentioned in SAL (#wikimedia-releng) [2019-03-12T18:38:03Z] <Krinkle> Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/479558 / T203506

Tentatively moving out of active issues until we see it again.

Still there.

npm ERR! Linux 4.9.0-0.bpo.8-amd64
npm ERR! argv "/usr/bin/nodejs" "/usr/local/bin/npm" "install"
npm ERR! node v6.11.0
npm ERR! npm  v3.8.3

npm ERR! shasum check failed for /tmp/npm-382-4886de3f/registry.npmjs.org/ajv/-/ajv-5.5.2.tgz
npm ERR! Expected: 73b5eeca3fab653e3d3f9422b341ad42205dc965
npm ERR! Actual:   562dbccf0db4461d2bdb4581ab00bbbebaad61eb
npm ERR! From:     https://registry.npmjs.org/ajv/-/ajv-5.5.2.tgz

We have not switched the jobs to use docker-registry.wikimedia.org/releng/castor:0.2.1 which includes the fix.

Change 511672 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Make castor use rsync --delay-updates

https://gerrit.wikimedia.org/r/511672

Mentioned in SAL (#wikimedia-releng) [2019-05-22T07:21:14Z] <hashar> Updating Jenkins job to have castor use rsync --delay-updates # T203506 | https://gerrit.wikimedia.org/r/#/c/integration/config/+/511672/

Change 511672 merged by jenkins-bot:
[integration/config@master] Make castor use rsync --delay-updates

https://gerrit.wikimedia.org/r/511672

Not sure whether the delay update will definitely fix the issue, but it should help.

greg added a subscriber: greg.Fri, Jul 26, 12:26 AM

Not sure whether the delay update will definitely fix the issue, but it should help.

It's been a while, are we good here?

I don't think I've seen this since the patch landed.

Jdforrester-WMF closed this task as Resolved.Fri, Jul 26, 1:13 AM
Jdforrester-WMF assigned this task to hashar.

Yes, boldly declaring this Resolved. (Not assigning into 201907 board as it was done last FY.)