Page MenuHomePhabricator

Track and install additional npm packages for all service container images
Closed, ResolvedPublic

Description

There is a base set of npm packages that are used by all services. Currently, server.js installs heapdump and gc-stats (possibly among others).

In the 2018-09-27 pipeline meeting we discussed where and how these packages should be installed, so that they can be tracked for security updates

Event Timeline

There is a base set of npm packages that are used by all services. Currently, server.js installs heapdump and gc-stats (possibly among others).

Just for background - the logic behind it is that these packages are binary, and we do not want to make service-runner have non-optional binary dependencies. Currently in service-runner requiring these packages is surrounded by try-catch, and if they're not installed, the functionality they provide is not provided.

This allows avoiding compiling native code by default when installing/testing locally which speeds up things quite a bit, allows to easily run experiments in VMs/docker by just mounting different volumes etc.

Of course if the service logic requires a native package this all does not matter.

My suggestion would be to create a nodeX-wmf-servicerunner-base image (or something with a less atrocious name) on top of our basic nodejs image.

Seems like this is something we could do in a few different ways. We could probably work out a way to do this in blubber using a variant. We could create an intermediate docker-pkg image.

  1. An intermediate docker-pkg layer requires SRE to make changes, which may become onerous for SRE and services depending on how frequently these packages change
  2. Making an intermediate docker-pkg layer the norm for services that share a set of base dependencies may become onerous for SRE to maintain
  3. Blubber would be more self-serve, but may not have the package security tracking requirements needed by SRE
  4. Any Blubber-based solution would require copying-and-pasting blubber.yaml to live in various repositories

These are fairly static node packages which really need to be recompiled only when the Node version changes, so I would vote for option (1).

jijiki triaged this task as Medium priority.Oct 26 2018, 10:16 AM

Assuming we go for option (1), how would we go around and install these packages? And how would we instruct the app to load them?
It seems like using NODE_PATH is discouraged these days[1] and would anyway require changes to blubber to set NODE_PATH. We used to have that variable set and have moved away from it in https://gerrit.wikimedia.org/r/#/c/blubber/+/460997/

The alternative seems to be node_prefix, but I guess that would require to also alter blubber? (correct me on this one). It also seems to be discouraged. The docs [1] say It is strongly encouraged to place dependencies in the local node_modules folder. These will be loaded faster, and more reliably.

There is another question I have as well, regardless of the implementation chosen. running npm install seems to alter package.json. I am guessing that calling it twice from different contexts will require --no-save?

[1] https://nodejs.org/api/modules.html#modules_loading_from_the_global_folders

Assuming we go for option (1), how would we go around and install these packages? And how would we instruct the app to load them?
It seems like using NODE_PATH is discouraged these days[1] and would anyway require changes to blubber to set NODE_PATH. We used to have that variable set and have moved away from it in https://gerrit.wikimedia.org/r/#/c/blubber/+/460997/

That's a good point. Since Blubber runs npm install anyway, perhaps we could make it do what service-runner does, which is to do npm install && npm install heapdump gc-stats on every build?

There is another question I have as well, regardless of the implementation chosen. running npm install seems to alter package.json. I am guessing that calling it twice from different contexts will require --no-save?

Euh, no it doesn't. It only modifies it if you run it as npm install --save <pkg-name>.

Oops, sorry for taking so long to answer, this fell through the cracks

Assuming we go for option (1), how would we go around and install these packages? And how would we instruct the app to load them?
It seems like using NODE_PATH is discouraged these days[1] and would anyway require changes to blubber to set NODE_PATH. We used to have that variable set and have moved away from it in https://gerrit.wikimedia.org/r/#/c/blubber/+/460997/

That's a good point. Since Blubber runs npm install anyway, perhaps we could make it do what service-runner does, which is to do npm install && npm install heapdump gc-stats on every build?

Sure. Just noting that this moves us away from option(1) and into option (2), aka blubber.

There is another question I have as well, regardless of the implementation chosen. running npm install seems to alter package.json. I am guessing that calling it twice from different contexts will require --no-save?

Euh, no it doesn't. It only modifies it if you run it as npm install --save <pkg-name>.

Hmm, it's probably the fact I passed gc-stats directly then cause I see

$ npm install gc-stats 

> gc-stats@1.2.1 install /root/mathoid/node_modules/gc-stats
> node-pre-gyp install --fallback-to-build

node-pre-gyp WARN Using request for node-pre-gyp https download 
[gc-stats] Success: "/root/mathoid/node_modules/gc-stats/build/gcstats/v1.2.1/Release/node-v57-linux-x64/gcstats.node" is installed via remote
npm WARN acorn-jsx@5.0.1 requires a peer of acorn@^6.0.0 but none is installed. You must install peer dependencies yourself.
npm WARN ajv-keywords@2.1.1 requires a peer of ajv@^5.0.0 but none is installed. You must install peer dependencies yourself.

+ gc-stats@1.2.1
updated 1 package in 5.078s

$ git diff
diff --git a/package.json b/package.json
index d0d2b8026..85f5102bb 100644
--- a/package.json
+++ b/package.json
@@ -45,6 +45,7 @@
     "cassandra-uuid": "^0.1.0",
     "compression": "^1.7.1",
     "express": "^4.16.2",
+    "gc-stats": "^1.2.1",
     "http-shutdown": "^1.2.0",
     "js-yaml": "^3.12.1",
     "mathoid-mathjax-node": "^0.7.0",

Anyway, I guess it's irrelevant if the calling code is going to be npm install && npm install heapdump gc-stats

Change 492922 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[blubber@master] Install heapdump and gc-stats when env production

https://gerrit.wikimedia.org/r/492922

I 've just added a simple patch to blubber that should allow what we 've discussed.

I'm pushing back on the patchset to Blubber for a couple of reasons:

  1. We're trying to keep Blubber decoupled from package managers as much as possible (beyond delegating to underlying programs with a handful of configurable options) so that it's general enough to work with many different use cases. Hard coding the installation of these packages in Blubber couples it to specific downstream users (service-runner dependent services) and thus compromises it's general usefulness.
  2. The base image + amended NODE_PATH option is workable. Even though use of NODE_PATH is discouraged, it is still available in NPM because of existing edge cases, and this seems like an edge case to me. In other words, using a feature of NPM that is discouraged but available for "hysterical raisins" still seems like a less crufty option than modifying Blubber.

Just for background - the logic behind it is that these packages are binary, and we do not want to make service-runner have non-optional binary dependencies. Currently in service-runner requiring these packages is surrounded by try-catch, and if they're not installed, the functionality they provide is not provided.

This allows avoiding compiling native code by default when installing/testing locally which speeds up things quite a bit, allows to easily run experiments in VMs/docker by just mounting different volumes etc.

So does the issue stem from the fact that NPM doesn't provide a facility for optional dependencies? And is there perhaps another way around that? Is it feasible to define some kind of meta NPM package that constitutes the base service-runner dependencies + production-specific dependencies, like a wmf-service-runner or something? Sorry if these questions have been posed already or are naive to what you're trying to accomplish.

I'm pushing back on the patchset to Blubber for a couple of reasons:

  1. We're trying to keep Blubber decoupled from package managers as much as possible (beyond delegating to underlying programs with a handful of configurable options) so that it's general enough to work with many different use cases. Hard coding the installation of these packages in Blubber couples it to specific downstream users (service-runner dependent services) and thus compromises it's general usefulness.
  2. The base image + amended NODE_PATH option is workable. Even though use of NODE_PATH is discouraged, it is still available in NPM because of existing edge cases, and this seems like an edge case to me. In other words, using a feature of NPM that is discouraged but available for "hysterical raisins" still seems like a less crufty option than modifying Blubber.

I understand the logic, but we also need to find a practical compromise between cleanness and (our) usability. How about having a semi-config stanza in blubber.yaml? Something like monitoring_modules: true perhaps?

So does the issue stem from the fact that NPM doesn't provide a facility for optional dependencies? And is there perhaps another way around that? Is it feasible to define some kind of meta NPM package that constitutes the base service-runner dependencies + production-specific dependencies, like a wmf-service-runner or something? Sorry if these questions have been posed already or are naive to what you're trying to accomplish.

Having services depend on a meta package that itself depends on the real package is a recipe for deps hell, and I'd really like to avoid that. Not only does it create more manual work, but it also adds complexity that is delegated to humans, and, as we know, PEBKAC is the number #1 reason things go wrong.

Having services depend on a meta package that itself depends on the real package is a recipe for deps hell, and I'd really like to avoid that. Not only does it create more manual work, but it also adds complexity that is delegated to humans, and, as we know, PEBKAC is the number #1 reason things go wrong.

That doesn't seem preferable, you're right. But isn't hardcoding a list of additional dependencies within Blubber simply obfuscating yet another kind of dependency hell? i.e. The list could change from time to time and therefore establish specific versions of Blubber, a language agnostic tool, as an intermediate dependency link between two disparate Node projects.

I'd like to find a practical solution as well, but I'd like it to be one that still exists within NPM's domain of concerns/control if at all possible.

I understand the logic, but we also need to find a practical compromise between cleanness and (our) usability. How about having a semi-config stanza in blubber.yaml? Something like monitoring_modules: true perhaps?

A compromise I could see here would be to have a node: { packages: [...] } construct in Blubber that allows a project to specify additional dependencies outside of packages.json. You'd still be duplicating a small bit of Blubber configuration across projects of course—I see that as an orthogonal issue with Blubber's one-config-per-project design that might warrant a broader discussion.

Or: We could have a field like node: { globalPackages: true } that sets up the environment in a [least icky] way that puts global modules [provided by the base image] in a place that is require-able by the service. That could be either copying modules from a central directory into the local node_modules or, again, amending the NODE_PATH with the central dir.

Or: Can we leverage [[ http://npm.github.io/using-pkgs-docs/package-json/types/optionaldependencies.html | NPM's optionalDependencies ]] here? Perhaps service-runner could declare the authoritative list of additional dependencies in its optionalDependencies and then Blubber could have some weird option that would install a project's optional dependencies (or those of a project dependency)?

Or: Isn't packages-lock.json supposed to be used for this kind of thing? Is that not the official way to declare an authoritative list of dependencies that a Node project requires in a production environment? (That thought is informed by a Ruby background, however, as Gemfile.lock is such a solution in that world.)

Or: Isn't packages-lock.json supposed to be used for this kind of thing? Is that not the official way to declare an authoritative list of dependencies that a Node project requires in a production environment? (That thought is informed by a Ruby background, however, as Gemfile.lock is such a solution in that world.)

If this solution works but having a committed packages-lock.json is problematic for whatever reason, you could potentially have a packages-lock.production.json or some such file and use the [[ https://gerrit.wikimedia.org/r/plugins/gitiles/blubber/+/fd20d346ddfe03d16ec5bead085f0b387fdb83a2 | new copies syntax in Blubber ]] to copy it to packages-lock.json just before npm install runs:

variants:
  prep:
    copies:
      - from: local
      - from: local
        source: packages-lock.production.json
        destination: packages-lock.json
  production:
    copies: [prep]
mobrovac claimed this task.

optionalDependencies seems like a good compromise as it will allow us to easily run npm install in Blubber without breaking environments that do not support these modules. I have added them to [service-runner](https://github.com/wikimedia/service-runner/pull/208), tested it and works as expected. Let's keep our fingers crossed that the NPM devs don't change their minds about recursively installing optionalDeps (like they kept changing their minds about the authority of package-lock.json vs package,json). As soon as the [service-runner PR #208](https://github.com/wikimedia/service-runner/pull/208) is merged and v2.6.17 published, this will automatically become a feature in all Node.js services, so I'm resolving this task as done.

Change 492922 abandoned by Alexandros Kosiaris:
Install heapdump and gc-stats when env production

Reason:
Per https://phabricator.wikimedia.org/T205911#5070535, an alternate path was chosen

https://gerrit.wikimedia.org/r/492922

FYI, service-runner v2.6.17 (that has the additional packages as optional dependencies) has been published, so we are good to go on this front.

Change 502964 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[mediawiki/services/cxserver@master] Bump service-runner to 2.6.17

https://gerrit.wikimedia.org/r/502964

Change 502964 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Bump service-runner to 2.6.17

https://gerrit.wikimedia.org/r/502964

Change 503002 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[mediawiki/services/mathoid@master] Bump service-runner to 2.6.17

https://gerrit.wikimedia.org/r/503002

Change 503002 merged by jenkins-bot:
[mediawiki/services/mathoid@master] Bump service-runner to 2.6.17

https://gerrit.wikimedia.org/r/503002

Change 503072 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/citoid@master] Bump service-runner to v2.6.17

https://gerrit.wikimedia.org/r/503072

Change 503074 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/graphoid@master] Bump service-runner to v2.6.17

https://gerrit.wikimedia.org/r/503074

Change 503074 merged by Mobrovac:
[mediawiki/services/graphoid@master] Bump service-runner to v2.6.17

https://gerrit.wikimedia.org/r/503074

Change 503072 merged by jenkins-bot:
[mediawiki/services/citoid@master] Bump service-runner to v2.6.17

https://gerrit.wikimedia.org/r/503072

Change 503076 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/chromium-render@master] Bump service-runner to v2.6.17

https://gerrit.wikimedia.org/r/503076

Change 503076 merged by Mobrovac:
[mediawiki/services/chromium-render@master] Bump service-runner to v2.6.17

https://gerrit.wikimedia.org/r/503076

Change 503077 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/eventstreams@master] Bump service-runner to v2.6.17

https://gerrit.wikimedia.org/r/503077

Change 503077 merged by Mobrovac:
[mediawiki/services/eventstreams@master] Bump service-runner to v2.6.17

https://gerrit.wikimedia.org/r/503077