Page MenuHomePhabricator

npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos
Closed, ResolvedPublic

Description

This is not currently causing build failures because things are working fine on npm 3 (node 6), and the projects already switched to npm 6 (node 10) are only those not affected by this bug.

But, in trying out node10 on a number of projects, some of them consistently fail in this way.

Repos on which Node 10 / npm 6 fails (but Node 6 / npm 3 passes)

  • node-rdkafka-statsd (change; build)
  • labs/tools/heritage (change; build)
  • pywikibot/i18n (change)
  • wikibase/javascript-api (change)
  • performance/fresnel (when a node_modules dir exists).
  • wikipeg (change, build)
  • mediawiki/extensions/Flow
  • mediawiki/extensions/TemplateData
  • mediawiki/extensions/Wikibase
+ rm -rf node_modules
+ npm install --no-progress

...

npm ERR! code Z_DATA_ERROR
npm ERR! errno -3
npm ERR! invalid distance too far back

Event Timeline

Krinkle triaged this task as Normal priority.Feb 7 2019, 10:22 PM
Krinkle created this task.
Krinkle renamed this task from Fix "Z_DATA_ERROR: invalid distance too far back to npm 6 consistently fails with "Z_DATA_ERROR: invalid distance too far back" on some repos.Feb 7 2019, 10:26 PM
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)Feb 23 2019, 2:01 AM
Krinkle updated the task description. (Show Details)Feb 23 2019, 2:33 AM
Krinkle updated the task description. (Show Details)Feb 27 2019, 2:19 PM
Krinkle updated the task description. (Show Details)Mar 11 2019, 2:18 AM

Change 498267 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/extensions/VisualEditor@master] build: Un-commit package-lock

https://gerrit.wikimedia.org/r/498267

Change 498276 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] Follow-up 8c5d25631: Revert Timo's inclusion of changes to VE

https://gerrit.wikimedia.org/r/498276

Change 498276 merged by jenkins-bot:
[integration/config@master] Follow-up 8c5d25631: Revert Timo's inclusion of changes to VE

https://gerrit.wikimedia.org/r/498276

Change 498267 abandoned by Jforrester:
build: Un-commit package-lock

Reason:
Testing artefact, no longer needed.

https://gerrit.wikimedia.org/r/498267

Jdforrester-WMF edited projects, added Upstream; removed Patch-For-Review.EditedMar 22 2019, 12:56 AM

Reported upstream by Timo.

I guess this is the same as T218978 but in that case it is causing builds to fail, thus I filed a different task. Feel free to merge, though, if you think it is best.

Change 498524 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/npm@master] wmf: Rewrite update.sh based on npmjs.org tarball

https://gerrit.wikimedia.org/r/498524

Change 498524 merged by Legoktm:
[integration/npm@master] wmf: Rewrite update.sh based on npmjs.org tarball

https://gerrit.wikimedia.org/r/498524

Change 498664 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] dockerfiles: Update node10 images with npm 6.5.0 tarball

https://gerrit.wikimedia.org/r/498664

Change 498664 merged by jenkins-bot:
[integration/config@master] dockerfiles: Update node10 images with npm 6.5.0 tarball

https://gerrit.wikimedia.org/r/498664

Mentioned in SAL (#wikimedia-releng) [2019-03-23T20:52:40Z] <Krinkle> Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/498664 / T215562

Change 498667 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] jjb: update node10 jobs to docker image 0.4.0

https://gerrit.wikimedia.org/r/498667

Change 498667 merged by jenkins-bot:
[integration/config@master] jjb: update node10 jobs to docker image 0.4.0

https://gerrit.wikimedia.org/r/498667

Status: Downloaded newer image for docker-registry.wikimedia.org/releng/node10-test:0.4.0
..
+ node --version
v10.4.0
+ npm --version
6.5.0
..
+ rm -rf node_modules
+ npm install --no-progress
..
npm ERR! code Z_DATA_ERROR
npm ERR! errno -3
npm ERR! invalid distance too far back
..
npm ERR! Callback called more than once.

Same as before. At least we've confirmed that the issue isn't related to how we install npm.

Change 499262 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Revert Flow js documentation back from node10 to jsduck

https://gerrit.wikimedia.org/r/499262

Change 499262 merged by jenkins-bot:
[integration/config@master] Revert Flow js documentation back from node10 to jsduck

https://gerrit.wikimedia.org/r/499262

Krinkle updated the task description. (Show Details)Mar 26 2019, 7:54 PM
Krinkle updated the task description. (Show Details)

Change 499312 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] Revert use of npm-run-doc on node10 for TemplateData and Wikibase

https://gerrit.wikimedia.org/r/499312

Change 499312 merged by jenkins-bot:
[integration/config@master] Revert use of npm-run-doc on node10 for TemplateData and Wikibase

https://gerrit.wikimedia.org/r/499312

Krinkle added a project: Operations.
Krinkle removed a project: Upstream.EditedMar 28 2019, 5:15 PM

It seems we've found the culprit. The problem is indeed the zlib1g library. This was apparent from the error, and also what the support forum pointed to.

However, what wasn't clear until today is how it is a problem.

  • Has Node 10 regressed since Node 6 in its support for zlib1g 1.2.8 and (by extent) Debian Linux?

This was my main question, given that Debian 8 and 9 have the same version of zlib1g, and Node 6 on Debian 8 worked fine for us, whereas Node 10 on Debian 9 (with the same zlib1g version) is broken for us.

It is likely true that Node.js's C++ code has changed between Node 6 and Node 10 in a way that no longer works with zlib1g 1.2.8.

@MoritzMuehlenhoff recalled that Node.js C++ source code contains a local copy of zlib1g that it is always compatible.

  • In Node 4 and Node 6, this was zlib 1.2.8.
  • As of Node 7 (and still in Node 8 and Node 10) this is zlib 1.2.11.

The default Makefile for Node.js, and the distribution by Nodesource compiles against this local copy, which is what Node.js supports.

The distribution by Debian, however, patched Node.js to link against the version of zlib1g that the current OS distribution prefers. With Node 6 that happened to match the version of Debian 8 and Debian 9. With Node 10 this is no longer the case.

As such, it is effectively our fault for packaging it this way. We need to either:

  1. make upstream Node.js support both (which they don't currently);
  2. or, backport a newer version of zlib1g to (our distribution of) Debian 9;
  3. or, re-compile our Node.js to use the local version.

Moritz says the easiest of these would be to re-compile our Node.js 10 package against the local copy of zlib1g from Node's source code, instead of dynamically linking the system-wide one from Debian.

Node 6 end of life is next month (2019-04-30).

As such, it is effectively our fault for packaging it this way. We need to either:

Not all all :-) This is entirely the fault of the node ecosystem, who designed their system of deploying binary modules after what they do for Javascript. Distributing binaries reliably is a very hard problem, people have been working in this since for decades, they should rather have investigated the lessons learned by others...

Moritz says the easiest of these would be to re-compile our Node.js 10 package against the local copy of zlib1g from Node's source code, instead of dynamically linking the system-wide one from Debian.

Ack, I'll take care of that.

@Krinkle I've prepared a new build and uploaded it to https://people.wikimedia.org/~jmm/node/

Could you verify that it fixes the Z_DATA_ERROR issue, if so I'll upload it to the node10 component on apt.wikimedia.org

Krinkle claimed this task.Apr 1 2019, 2:57 PM
Krinkle added a subscriber: MoritzMuehlenhoff.

@Krinkle I've prepared a new build and uploaded it to https://people.wikimedia.org/~jmm/node/

Thanks! I've used dpkg -i sometimes to install a .deb file, but I may need some help with this directory.

I'm looking to install it instead of the apt-get command at node10/Dockerfile. I would then build it, and the node10-test child image locally and then start the container locally to see if the issue can still be reproduced.

OK. Looks like the image will already be tested as part of another service deployment. Assigning back to Moritz to notify once it's up on apt-wikimedia so that I can rebuild the relevant CI images after that.

Mentioned in SAL (#wikimedia-operations) [2019-04-04T11:35:58Z] <moritzm> uploaded nodejs 10.15.2~dfsg-1+wmf1 to the component/node10 component of apt.wikimedia.org/stretch-wikimedia (updated to latest 10.x release and a change to ensure zlib binary compat with NodeSource) (T215562)

OK. Looks like the image will already be tested as part of another service deployment. Assigning back to Moritz to notify once it's up on apt-wikimedia so that I can rebuild the relevant CI images after that.

I've updated the component/node10 component now, reassigning back for further tests/rebuild of CI images.

Change 501407 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] dockerfiles: Create node10:0.5.0 images

https://gerrit.wikimedia.org/r/501407

Change 501408 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[integration/config@master] jjb: update npm jobs to use node10:0.5.0 docker image

https://gerrit.wikimedia.org/r/501408

Change 501407 merged by jenkins-bot:
[integration/config@master] dockerfiles: Create node10:0.5.0 images

https://gerrit.wikimedia.org/r/501407

Krinkle updated the task description. (Show Details)Apr 4 2019, 8:48 PM

Change 501408 merged by jenkins-bot:
[integration/config@master] jjb: update npm jobs to use node10:0.5.0 docker image

https://gerrit.wikimedia.org/r/501408

Krinkle closed this task as Resolved.Apr 4 2019, 9:32 PM