Page MenuHomePhabricator

Switch RESTBase to use Node.js 4.2
Closed, ResolvedPublic

Description

As T91855 is resolved we now have iojs available on Jessie. This means that we can start switching RESTBase hosts to a modern version of v8. Apart from heapdump (only used for debugging) RESTBase has no binary dependencies, so we can simply switch one node at a time.

The expected performance benefit is significant. In local testing with ab -n 10000 -c 20 http://localhost:7231/en.wikipedia.org/v1/page/html/Foobar, iojs improves throughput from about 840 req/s to 940 req/s.

Event Timeline

GWicke raised the priority of this task from to Medium.
GWicke updated the task description. (Show Details)
GWicke added a project: RESTBase.
GWicke subscribed.

I have switched one of our staging servers (xenon). In the process, I discovered that the iojs packages only install /usr/bin/node, but not debian's preferred name of /usr/bin/nodejs. A symlink fixed this, and the service started up as expected.

Going forward, we have a choice of using /usr/bin/nodejs or /usr/bin/node as the primary entry point referenced from init scripts etc. For now, while testing, it seems easiest to just add the symlink. Later, we can then add the symlink to /usr/bin/nodejs using puppet, or change our mind and switch to /usr/bin/node.

Load testing on xenon went well. Memory usage might be more stable than with 0.10, with long-running workers not exceeding 250mb RSS.

In light of this, I went ahead and switched restbase1001 to iojs.

In production, iojs is using more RSS (~550m on the largest workers, vs. about 350m with node 0.10), but seems to be fairly stable on memory (heap) used. Our heap limits are on memory used.

pasted_file (1×1 px, 319 KB)

Puppet has reverted those hosts back to nodejs since. RSS usage was indeed higher than with 0.10:

pasted_file (1×1 px, 350 KB)

Otherwise, there were no issue in the logs or metrics.

Now that node 4.1 is available in Debian unstable, and the next node LTS release is going to be cut based on that in the next weeks, I think it makes sense to consider moving straight to node 4.1.

The nodejs packages from unstable install as-is on jessie:

The following extra packages will be installed:                                                  
  gcc-5-base libicu55 libstdc++6 libuv1 libuv1-dev nodejs-dev                                    
The following NEW packages will be installed:                                                    
  gcc-5-base libicu55 libuv1 libuv1-dev                                                          
The following packages will be upgraded:                                                         
  libstdc++6 nodejs nodejs-dev                                                                   
3 upgraded, 4 newly installed, 0 to remove and 406 not upgraded

We can't currently test the iojs packages in production (blocked on https://gerrit.wikimedia.org/r/#/c/229304/), but we could start testing node 4.1 without requiring puppet changes.

@mobrovac, @Eevans, @Pchelolo, @MoritzMuehlenhoff, @fgiunchedi, @Yurik, @akosiaris: Any thoughts?

As a basic test, I have upgraded node to the sid 4.1 packages on deployment-restbase02. RESTBase starts without issues.

Good news about Node.js 4.1! Upgrading libstdc++ and running with binaries built with gcc 5 is a very bad idea, however (not to mention libicu). We generally rebuild against jessie and create backports -if jessie-backports doesn't already have them- which is usually trivial and would solve all of these problems. I can tackle this soon if noone else wants to handle this :)

@GWicke, should we also formulate a plan and a timeline regarding the upgrade of the rest of our Node.js services (parsoid + services cluster, I guess?)

maps cluster might also benefit - Mapnik node lib has just been merged for the node 4.0, but there might be some kinks there, will need to test.

BTW, would it make sense to skip iojs and go directly to 4.0, skipping iojs versions.

@faidon, agreed re libstdc++; that's why I listed the changes above.

We generally rebuild against jessie and create backports -if jessie-backports doesn't already have them- which is usually trivial and would solve all of these problems. I can tackle this soon if noone else wants to handle this :)

Thanks! @MoritzMuehlenhoff already imported the iojs packages, so maybe he would be willing to take this one on as well.

@GWicke, should we also formulate a plan and a timeline regarding the upgrade of the rest of our Node.js services (parsoid + services cluster, I guess?)

Yes, I think that makes sense, especially for jessie. Lets start with specific nodes / services though, and only switch over generally once we are satisfied that everything is alright. If we upload this to the backports repo, then both 0.10 and 4.1 should remain available. There are several puppet modules that currently pull in latest nodejs (contint, for example), so we'd have to prevent those from upgrading in the meantime.

Thanks! @MoritzMuehlenhoff already imported the iojs packages, so maybe he would be willing to take this one on as well.

Indeed. I can take care of it at the beginning of next week. I'll contact Jeremy Lal on his plans, ideally Debian sid/testing with stick with 4.1 until the next LTS and we can simply update nodejs in jessie-backports and import that build (as we do with openjdk-8).

@GWicke, should we also formulate a plan and a timeline regarding the upgrade of the rest of our Node.js services (parsoid + services cluster, I guess?)

Yes, I think that makes sense, especially for jessie. Lets start with specific nodes / services though, and only switch over generally once we are satisfied that everything is alright. If we upload this to the backports repo, then both 0.10 and 4.1 should remain available.

The 4.1 packages have the same binary package names as the 0.10 packages from jessie. We shouldn't change the binary package names for compatibility and updates to stretch, If we still need new setups of 0.10 we could add it to a separate repo suite.

There are several puppet modules that currently pull in latest nodejs (contint, for example), so we'd have to prevent those from upgrading in the meantime.

IMO puppet role with "package->latest" should be used only in rare exceptions (e.g. tzdata seems acceptable to me): We directly use the Debian/Ubuntu repos and if there's a regression in any of those packages, it directly affects the cluster.

>> Yes, I think that makes sense, especially for jessie. Lets start with specific nodes / services though, and only switch over generally once we are satisfied that everything is alright. If we upload this to the backports repo, then both 0.10 and 4.1 should remain available.

The 4.1 packages have the same binary package names as the 0.10 packages from jessie. We shouldn't change the binary package names for compatibility and updates to stretch, If we still need new setups of 0.10 we could add it to a separate repo suite.

I agree both 0.10 and 4.1 should remain available. There is also the path of different package names. Following the jdk naming like nodejs010 or nodejs41. The reason I am saying this is I am not sold on the separate repo suite, since I think the problem will re-emerge in the future. As in "hey we got this library dependency and nodejs version X.Y in the separate repo suite and suddenly user X wants this library dependency but not nodejs version X.Y".

What do you think ?

There are several puppet modules that currently pull in latest nodejs (contint, for example), so we'd have to prevent those from upgrading in the meantime.

IMO puppet role with "package->latest" should be used only in rare exceptions (e.g. tzdata seems acceptable to me): We directly use the Debian/Ubuntu repos and if there's a regression in any of those packages, it directly affects the cluster.

+1

I agree both 0.10 and 4.1 should remain available. There is also the path of different package names. Following the jdk naming like nodejs010 or nodejs41. The reason I am saying this is I am not sold on the separate repo suite, since I think the problem will re-emerge in the future. As in "hey we got this library dependency and nodejs version X.Y in the separate repo suite and suddenly user X wants this library dependency but not nodejs version X.Y".

I had the same thought and want to talk about that with the Debian nodejs maintainer. It might be useful to have the nodejs LTS release and more-recent nodejs versions co-installable.

As a basic test, I have upgraded node to the sid 4.1 packages on deployment-restbase02. RESTBase starts without issues.

Could we not do that in the Beta Cluster? We have other projects where we can test such things. deployment-restbase01 was converted to sid at some point as well and Puppet started failing on it, cf. T113965: deployment-restbase01 is on sid; puppet-run failing as a result.

should we also formulate a plan and a timeline regarding the upgrade of the rest of our Node.js services (parsoid + services cluster, I guess?)

I know this ticket is about switching RESTBase over to 4.1, and with that I'm ok - RESTBase is the only nodejs service on those boxes, so it'll either work or it won't. But for other services (especially SCA/SCB ones), I'm not sure how to approach it. We'd need to test them thoroughly before proceeding, that's for sure.

I know this ticket is about switching RESTBase over to 4.1, and with that I'm ok - RESTBase is the only nodejs service on those boxes, so it'll either work or it won't. But for other services (especially SCA/SCB ones), I'm not sure how to approach it. We'd need to test them thoroughly before proceeding, that's for sure.

I am also worried about those as well. So, depending to what @MoritzMuehlenhoff reports back for the nodejs Debian maintainer, we *just* might have a way to differentiate between nodejs package versions depending on a package name. Alternatively perhaps by inserting one more repo (at a higher priority). At which point the choice of package is programmable and we can amend puppet manifests to allow that choice (I am thinking with hiera overrides). Which should allow both easy testing in labs and deciding the version in production.

I am also worried about those as well. So, depending to what @MoritzMuehlenhoff reports back for the nodejs Debian maintainer, we *just* might have a way to differentiate between nodejs package versions depending on a package name. Alternatively perhaps by inserting one more repo (at a higher priority).

I'm most worried here about cases where one service works on a newer version, but another doesn't (somehow Zotero springs to mind :P).

At which point the choice of package is programmable and we can amend puppet manifests to allow that choice (I am thinking with hiera overrides).

Sounds reasonable, but that should be fixed for the node's role (as in, target role::sca, not e.g. role::citoid)

I am also worried about those as well. So, depending to what @MoritzMuehlenhoff reports back for the nodejs Debian maintainer, we *just* might have a way to differentiate between nodejs package versions depending on a package name. Alternatively perhaps by inserting one more repo (at a higher priority).

I'm most worried here about cases where one service works on a newer version, but another doesn't (somehow Zotero springs to mind :P).

Same here. Zotero of course does not use nodejs but xpcshell but I get your pain.

At which point the choice of package is programmable and we can amend puppet manifests to allow that choice (I am thinking with hiera overrides).

Sounds reasonable, but that should be fixed for the node's role (as in, target role::sca, not e.g. role::citoid)

That depends. we might want to migrate say service A to the newer version but not service B and they 're on the same host. In any case let's see what @MoritzMuehlenhoff reports back and discuss the technical details then.

As a basic test, I have upgraded node to the sid 4.1 packages on deployment-restbase02. RESTBase starts without issues.

Could we not do that in the Beta Cluster?

This is the secondary Cassandra node. As you know, only deployment-restbase01 is used for requests. In any case, deployment-restbase02 is back to 0.10.

Ricordisamoa renamed this task from Switch RESTBase to use iojs to Switch RESTBase to use Node.js 4.Sep 30 2015, 6:46 PM

Now that node 4.1 is available in Debian unstable, and the next node LTS release is going to be cut based on that in the next weeks, I think it makes sense to consider moving straight to node 4.1.

BTW, would it make sense to skip iojs and go directly to 4.0, skipping iojs versions.

I know this ticket is about switching RESTBase over to 4.1, and with that I'm ok

This is the secondary Cassandra node. As you know, only deployment-restbase01 is used for requests. In any case, deployment-restbase02 is back to 0.10.

My apologies. I thought you switched the whole VM to sid, not just installed one package from it (hence the cf in my previous post).

Also, let's not forget that the Analytics team will soon have their own RESTBase cluster, and I think we should keep these two on the same nodejs/iojs version to preserve our sanity.

I can take care of it at the beginning of next week. I'll contact Jeremy Lal on his plans, ideally Debian sid/testing with stick with 4.1 until the next LTS and we can simply update nodejs in jessie-backports and import that build (as we do with openjdk-8).

Thank you, much appreciated!

I talked to the maintainer: The new packages won't be co-installable, there will be only one release in the archive.

He also mentioned that 4.4.1 isn't the LTS release yet, the information on https://github.com/nodejs/LTS/ are examples/plans and 4.4.1 hasn't officially been made the LTS release yet. There's a new release scheduled for 5th October, maybe that will be declared the LTS release?

For the topic of having both options available: I'd like to avoid to modify the binary names in local WMF builds, during the phase where we need to support 0.10 and 4.? in parallel I'd prefer a separate nodejs4 repo which is enabled through a Hiera config option (once 0.10 is phased out, the binaries can simply be moved to jessie-wikimedia).

I talked to the maintainer: The new packages won't be co-installable, there will be only one release in the archive.

Yeah, I was afraid of that.

For the topic of having both options available: I'd like to avoid to modify the binary names in local WMF builds, during the phase where we need to support 0.10 and 4.? in parallel I'd prefer a separate nodejs4 repo which is enabled through a Hiera config option (once 0.10 is phased out, the binaries can simply be moved to jessie-wikimedia).

OK. That does have the implication that should more than one nodejs service be installed on the same host (SCA cluster basically) they all should be tested with the newer version before migrating. But I think it is doable at this point, since we do not have too many services collocated at the same hosts.

He also mentioned that 4.4.1 isn't the LTS release yet, the information on https://github.com/nodejs/LTS/ are examples/plans and 4.4.1 hasn't officially been made the LTS release yet. There's a new release scheduled for 5th October, maybe that will be declared the LTS release?

At this point waiting for the LTS is the best option IMHO.

Yeah, I was afraid of that.

Me too, but it makes perfect sense.

OK. That does have the implication that should more than one nodejs service be installed on the same host (SCA cluster basically) they all should be tested with the newer version before migrating. But I think it is doable at this point, since we do not have too many services collocated at the same hosts.

It'd be perfect to couple that with T96017: Migrate SCA cluster to SCB (Jessie and Node 4.2) (since we've kind of neglected that ticket).

In local testing I discovered that Debian's npm package is actually not updated yet, and npm install fails. So, we'll have to wait until Debian's npm package has caught up as well.

+1. Parsoid would like to use a version of node which includes generator support, which would be node >= 0.11. The version of node running in production (0.10.25) has had a number of bug- and security-fix releases since that time, for example 0.10.37 was a security release: http://dailyjs.com/2015/03/18/1399-node-roundup/

Added a similar task for Parsoid (T119228), blocked by this one. Once RESTBase switches we'll want to switch shortly afterwards, but we don't want to switch before RESTBase does. We'll let them be the pathfinders.

GWicke renamed this task from Switch RESTBase to use Node.js 4 to Switch RESTBase to use Node.js 4.2.Dec 1 2015, 3:28 AM

@yuvipanda, @MoritzMuehlenhoff, @faidon: What is the current plan for unstable Debian repositories / backports? Is there a way to access the node 4.2 packages from unstable on specific cluster machines (the RB cluster), without upgrading all jessie hosts at once?

Debian unstable with stick with the 4.2 LTS release, while Debian experimental will follow the latest releases (like 5.2 currently).
We can't install the nodejs packages from unstable in jessie, this would effectily upgrade these hosts to Debian unstable as well, since the binaries from unstable are linked against libstdc++ 5.2 and a more recent ICU.

To upgrade restbase to 4.2 we should rebuild the 4.2.x packages from unstable for jessie-backports. I can do that, but it'll take a bit, since I'm currently busy with other tasks.

I've made a backport of nodejs 4.2.4 for jessie (not copied to carbon yet, need some testing feedback from node-based apps before that can happen). It's available at people.wikimedia.org:~/jmm/

Let me know if you run into any problems

4.2 shares the same binary package name with 0.10, so we can have the two in parallel (this would require extensive changes and would spoil the opportunity to make the 4.2/jessie-backports maintenance inside the Debian archive (similar to what we do with openjdk-8)

@MoritzMuehlenhoff: Thank you for the packages! I installed those on restbase1009 and restarted restbase. I'll keep an eye on it.

Things are looking good so far. As expected, the GC behavior is a bit more incremental in 4.2's V8, but the rate of reaching our configured heap limit has not materially changed from 0.10.

I would propose to gradually roll out 4.2 to a few more RESTBase nodes today. I think @mobrovac and @akosiaris are also considering to upgrade the scb cluster, which is currently running the mobile content service.

According to logstash, the heap limit is reached less often on nodes running 4.2.

Current migration status in Eqiad:

  • 1001 - 1004: node 0.10
  • 1005 - 1009: node 4.2

GC behavior continues to look better for 4.2 than 0.10:

pasted_file (700×1 px, 191 KB)

Peak memory usage is higher than with 0.10, but collections generally seem to happen in time, without violating the threshold:

pasted_file (1×1 px, 362 KB)

Mean heap usage has barely changed.

Just to be sure, I think it's worth waiting for another day before committing RESTBase to 4.2 by switching over 1001 - 1004.

Per T123297, the remaining nodes have been switched to 4.2, but they have not been restarted. To be on the safe side, I think we should temporarily revert 1001-1004 to 0.10 until we are ready to migrate & restart the remaining nodes in an orderly manner.

It seems that somebody removed my access to those nodes, so I can't do this myself.

mobrovac claimed this task.

All of the remaining nodes (restbase100[1-4] and restbase200x) have now been switched to Node 4.2 and RESTBase has been restarted there.

@mobrovac, could you update the staging cluster as well?

@mobrovac, could you update the staging cluster as well?

{{done}} for all 6 nodes in the staging cluster.