Page MenuHomePhabricator

Jenkins jobs for MediaWiki failing with 'npm: shasum check failed'
Closed, ResolvedPublicPRODUCTION ERROR


00:01:09.070 npm ERR! node v6.11.0
00:01:09.070 npm ERR! npm  v3.8.3
00:01:09.074 npm ERR! shasum check failed for /tmp/npm-394-6a81022a/
00:01:09.075 npm ERR! Expected: 0918a552406dc5e540b380dcd97afc4a64332f8e
00:01:09.075 npm ERR! Actual:   9df1b7fc10b0a92a30aedd327383305d84e9ea06
00:01:09.075 npm ERR! From:
00:01:09.076 npm ERR! 
00:01:09.077 npm ERR! If you need help, you may report this error at:
00:01:09.077 npm ERR!     <>

Been happening for about 2-3 days now. The error is for a different package each time, so not specific to any individual package I think.

Seems particularly common in gate-and-submit.

Event Timeline

Krinkle triaged this task as High priority.Sep 4 2018, 8:20 PM
Krinkle updated the task description. (Show Details)

And again

00:01:23.145 npm ERR! argv "/usr/bin/nodejs" "/usr/local/bin/npm" "install" "--no-progress"
00:01:23.146 npm ERR! node v6.11.0
00:01:23.146 npm ERR! npm  v3.8.3
00:01:23.155 npm ERR! shasum check failed for /tmp/npm-550-af7312e6/
00:01:23.155 npm ERR! Expected: 32d1d653e1d90408854bfb296f076ec7e186a300
00:01:23.156 npm ERR! Actual:   55f68fcc5d3ac0193aee91325c4d5c1f444710c6
00:01:23.157 npm ERR! From:
00:01:23.157 npm ERR! If you need help, you may report this error at:
00:01:23.158 npm ERR!     <>


@Legoktm Yeah, I think this is probably due to a cache corruption. Clearing castor for the affected key (repo+branch?) could help, although it's likely to come back eventually. Might be due to aborted builds saving bad data to castor or some other scenario.

Newer npm versions are supposed to be better in that they only put things atomically in the cache after verification, or something.

So clearing castor once and upgrading our npm version might resolve this (v5.x or later, which came with official Node 6).

There were some major changes when we moved from npm v1/v2 to v3, but I'm not aware of any major changes in v5 that would complicate an upgrade, but it's worth looking into.

Krinkle changed the task status from Open to Stalled.Sep 25 2018, 5:49 PM

This is blocked on CI upgrading to npm 5 or later. The cache instability was a known in older version of npm and is (allegedly) been resolved in 5.x and 6.0.

(Still seen at least a dozen times over the past 2 days, probably the most common kind of false negative at the moment.)

hashar changed the task status from Stalled to Open.Dec 11 2018, 4:58 PM
hashar added a subscriber: thcipriani.

shower thoughts:

We have jobs writing to the central cache constantly, potentially a job could download a file as it is being written to. The cached file ends up being corrupted since it is in the process of being overwritten. I would then assume that npm 3.8 ends up failing instead of downloading a fresh copy from

@thcipriani mentioned we might want to pass --delete-updates to rsync.

Change 479558 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[integration/config@master] castor: add --delay-updates to rsync commands

(tagging team given a patch is being worked on and in need of review.)

Change 479558 merged by jenkins-bot:
[integration/config@master] castor: add --delay-updates to rsync commands

Tentatively moving out of active issues until we see it again.

Still there.

npm ERR! Linux 4.9.0-0.bpo.8-amd64
npm ERR! argv "/usr/bin/nodejs" "/usr/local/bin/npm" "install"
npm ERR! node v6.11.0
npm ERR! npm  v3.8.3

npm ERR! shasum check failed for /tmp/npm-382-4886de3f/
npm ERR! Expected: 73b5eeca3fab653e3d3f9422b341ad42205dc965
npm ERR! Actual:   562dbccf0db4461d2bdb4581ab00bbbebaad61eb
npm ERR! From:

We have not switched the jobs to use which includes the fix.

Change 511672 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Make castor use rsync --delay-updates

Change 511672 merged by jenkins-bot:
[integration/config@master] Make castor use rsync --delay-updates

Not sure whether the delay update will definitely fix the issue, but it should help.

Not sure whether the delay update will definitely fix the issue, but it should help.

It's been a while, are we good here?

I don't think I've seen this since the patch landed.

Jdforrester-WMF assigned this task to hashar.

Yes, boldly declaring this Resolved. (Not assigning into 201907 board as it was done last FY.)

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM