Page MenuHomePhabricator

Migrate node-based services in production to node10
Open, MediumPublic

Description

The majority of our node-based services is currently running Node 6 and that branch is EOLed 2019-04-01: https://nodejs.org/en/about/releases/

I can backport security fixes for a while, but that's not ideal and we should plan the migration of node-based services to 10 (which also implies stretch as node10 has hard requirements on libraries only in stretch).

Services currently using nodejs:

node10 debs for stretch are available in the repository component "component/node10" , see T203239 for further details. After https://gerrit.wikimedia.org/r/477475 service::node deploys the stretch nodejs10 component with the parameter use_nodejs10 set to true.

Related Objects

Event Timeline

Does this mean he have a hard deadline of 2019-04-01 for completing the migrations? Or per the "I can backport security fixes for a while" we have a couple of more months? The current goal is that by July 2019 all scb services, restbase (and probably aqs as well), proton, parsoid will be in kubernetes. That will leave turnilo and aphlict I guess.

etherpad-lite is a whole story in its own as the software is in what I would call "maintenance mode". Docs say it requires node 6.9+ and recommends node 8.9+. Hopefully that means it's compatible with node10 but remains to be seen.

Does this mean he have a hard deadline of 2019-04-01 for completing the migrations? Or per the "I can backport security fixes for a while" we have a couple of more months?

No, it's not a hard deadline. I feel perfectly comfortable to backport fixes for longer. It's mostly a case of "Moritz was working on nodejs security updates, saw the EOL note, realised that goal planning is in progress, so seemed useful to start the discussion and make a task" :-)

As mentioned in https://phabricator.wikimedia.org/T209711#4788954 I am looping in @hashar to also allow Releng to test NodeJS 10 :)

Change 477475 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] service::node: add the 'use_nodejs10' parameter

https://gerrit.wikimedia.org/r/477475

Change 477500 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] turnilo: fix dependency cycle removing require_package

https://gerrit.wikimedia.org/r/477500

Change 477500 merged by Elukey:
[operations/puppet@production] turnilo: fix dependency cycle removing require_package

https://gerrit.wikimedia.org/r/477500

jijiki triaged this task as Medium priority.Dec 4 2018, 10:17 PM

Change 477475 merged by Elukey:
[operations/puppet@production] service::node: add the 'use_nodejs10' parameter

https://gerrit.wikimedia.org/r/477475

Change 508822 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: upgrade to nodejs10

https://gerrit.wikimedia.org/r/508822

Mentioned in SAL (#wikimedia-operations) [2019-05-08T14:03:26Z] <gehel> starting upgrade to nodejs 10 for maps - T210704

Change 508822 merged by Gehel:
[operations/puppet@production] maps: upgrade to nodejs10

https://gerrit.wikimedia.org/r/508822

Mentioned in SAL (#wikimedia-operations) [2019-05-08T19:26:27Z] <gehel> continue upgrade to nodejs 10 for maps - T210704

Mentioned in SAL (#wikimedia-operations) [2019-05-08T20:10:42Z] <gehel> upgrade to nodejs 10 for maps completed - T210704

We recently tried to upgrade to nodejs10 for cxserver but it seems zlib 1.2.11 is required.

Example error: MT processing error for: en > qqq. Error: invalid distance too far back at Zlib.zlibOnError [as onerror]

See: https://github.com/nodejs/node/issues/22839

Are you using component/node10? This should be fixed already, see https://phabricator.wikimedia.org/T215562#5066711 and followups.

Are you using component/node10? This should be fixed already, see https://phabricator.wikimedia.org/T215562#5066711 and followups.

given they're using our base images, I guess we have to update the nodejs base images to use it.

Are you using component/node10? This should be fixed already, see https://phabricator.wikimedia.org/T215562#5066711 and followups.

given they're using our base images, I guess we have to update the nodejs base images to use it.

To correct myself: we already use that component. I'm nonetheless creating a new version of the base image. @KartikMistry what base image are you using for your project?

To correct myself: we already use that component. I'm nonetheless creating a new version of the base image. @KartikMistry what base image are you using for your project?

docker-registry.wikimedia.org/nodejs10-slim and docker-registry.wikimedia.org/nodejs10-devel

To correct myself: we already use that component. I'm nonetheless creating a new version of the base image. @KartikMistry what base image are you using for your project?

docker-registry.wikimedia.org/nodejs10-slim and docker-registry.wikimedia.org/nodejs10-devel

You have the problem with both? In theory the nodejs10-devel image should use the official packages from nodesource, so it shouldn't be affected, unless I'm missing something.

To correct myself: we already use that component. I'm nonetheless creating a new version of the base image. @KartikMistry what base image are you using for your project?

docker-registry.wikimedia.org/nodejs10-slim and docker-registry.wikimedia.org/nodejs10-devel

You have the problem with both? In theory the nodejs10-devel image should use the official packages from nodesource, so it shouldn't be affected, unless I'm missing something.

I don't think they 've had to interface directly with the nodejs10-devel image as the tests pass successfully (so probably no problem there).

Could you please describe how to reproduce this? It seems we know exactly what it is but just to make sure when we are build the new images.

Could you please describe how to reproduce this? It seems we know exactly what it is but just to make sure when we are build the new images.

I was able to reproduce error we saw in Production using end point tests manually with docker image tag (2019-06-18-094614-production).

I was able to reproduce error we saw in Production using end point tests manually with docker image tag (2019-06-18-094614-production).

Let me know if any further information needed.

@KartikMistry if we trigger a rebuild of the production container, it should now use the newer nodejs10-slim image and work as expected. Can you please confirm?

@KartikMistry if we trigger a rebuild of the production container, it should now use the newer nodejs10-slim image and work as expected. Can you please confirm?

This looks good. Deployed in Production.

Checking the box for phabricator/aphlict. aphlict is now running on a dedicated VM, aphlict1001, on buster and nodejs 10. The phabricator server phab1001 is also on buster and nodejs10. (which we could maybe remove now).

Also checking the box for etherpad. That is also on buster and nodejs10 meanwhile. Upgraded by Alex Kosiaris.