Page MenuHomePhabricator

npm-node-4.3 jobs are failing because node is version 4.4.6
Closed, ResolvedPublic

Description

npm-node-4.3 jobs are failing because node is version 4.4.6

Last good run: https://integration.wikimedia.org/ci/job/npm-node-4.3/18988/console
First failing: https://integration.wikimedia.org/ci/job/npm-node-4.3/18989/console

14:22:15 [npm-node-4.3] $ /bin/bash -e -u /tmp/hudson6775255597999561055.sh
14:22:15 Assertion error: node version v4.4.6 does not match '^v4[.]3[.]'

Last entry in the job history is dated 2016-02-15_11-25-23.
The openstack image ci-jessie-wikimedia says updated_at 2016-06-03T19:08:04.000000.
The working instance was created: 2016-07-05 14:15:30,708 INFO nodepool.NodeLauncher: Creating server with hostname ci-jessie-wikimedia-169438 in wmflabs-eqiad from image ci-jessie-wikimedia for node id: 169438
The failing instance was created: 2016-07-05 14:21:12,773 INFO nodepool.NodeLauncher: Creating server with hostname ci-jessie-wikimedia-169462 in wmflabs-eqiad from image ci-jessie-wikimedia for node id: 169462
https://apt.wikimedia.org/wikimedia/dists/jessie-wikimedia/backports/ is dated 04-Jul-2016 14:04.
https://apt.wikimedia.org/wikimedia/pool/backports/n/nodejs/ is dated 24-Jun-2016 08:23 and is the date nodejs 4.4.6 was uploaded there.
Neither https://apt.wikimedia.org/wikimedia/pool/backports/n/nodejs/ nor https://apt.wikimedia.org/wikimedia/pool/main/n/nodejs/ contain a node 4.3.
Server admin log: 2016-06-24 08:29 moritzm: uploaded nodejs 4.4.6 for jessie-wikimedia to carbon

Event Timeline

I think this is an easy fix.

Change 297421 had a related patch set uploaded (by Paladox):
Fix npm 4.3 patten to do 4.4 instead

https://gerrit.wikimedia.org/r/297421

All the ci-Jessie slaves may need updating since some may not and may break if we merge the change.

But probably want to change it so it will work with all 4.* changes.

Addshore added a subscriber: Addshore.

It would be great to get a fix for this merged :/

+1

we need https://gerrit.wikimedia.org/r/#/c/297415/ to merge before we can prepare the Wikidata deployment branch for this week

I currently have problems deploying jjb changes.

Paladox triaged this task as Unbreak Now! priority.Jul 5 2016, 4:45 PM

I doint know what npm-node-4.3 test are affected, so I am changing the status. I think we need to support something like 4[.] to support all sub releases.

Are we still hoping to support node 4.3 for testing? Or is 4.4 the official live version against which we should be running tests? Is the problem simply that the version assertion is out-of-date? That latter is certainly a simpler fix than the former (@Paladox's fix should work fine for the latter).

Are we still hoping to support node 4.3 for testing? Or is 4.4 the official live version against which we should be running tests? Is the problem simply that the version assertion is out-of-date? That latter is certainly a simpler fix than the former (@Paladox's fix should work fine for the latter).

4.4.7 is the most recent LTS version, 4.3 is an older version of that LTS branch (their notion of LTS involves such version updates within the LTS branch). So I doubt we need to support for 4.3, it's simply outdated at this point.

@thcipriani there's a plan (T138561: Updates various services to nodejs 4.4.6) to migrate various services to 4.4.6 soon but this seems like the "switch" (whatever/wherever that is) somehow got flipped too early on the version change.

greg added a project: Services.
greg added a subscriber: greg.

Adding Services for their information/heads up.

thcipriani lowered the priority of this task from Unbreak Now! to High.Jul 5 2016, 6:01 PM

I just deployed https://gerrit.wikimedia.org/r/#/c/297421/ that should be a good patch for the time being.

I'll leave this task open so that we can continue the discussion as to the Right™ fix (whether to downgrade the version or upgrade the version check). Lowering priority.

Change 297421 merged by jenkins-bot:
Fix npm assert so 4.* will work instead of us manually updating it everytime nodejs is updated

https://gerrit.wikimedia.org/r/297421

The root cause is that Nodepool finally managed to refresh the Jessie image after two weeks of it being stall ( T138106 ). It thus came up with a newer version of nodejs for Jessie which has been uploaded without anyone notifying ....

On a Jessie permanent slave:

$ apt-cache policy nodejs
nodejs:
  Installed: 4.4.6~dfsg-1+wmf1
  Candidate: 4.4.6~dfsg-1+wmf1
  Version table:
 *** 4.4.6~dfsg-1+wmf1 0                                                                     
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/backports amd64 Packages                          
        100 /var/lib/dpkg/status                                                                                   
     4.2.4~dfsg-1~bpo8+1 0                                                                                                           
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/main amd64 Packages
     0.10.29~dfsg-2 0
        500 http://mirrors.wikimedia.org/debian/ jessie/main amd64 Packages

So as four hours ago, the job have been switched from nodejs 4.3 to nodejs 4.4.6.

The jobs are meant to represent whatever we have in production, which is usually the package version in apt.wikimedia.org. Looks like we will need a better strategy, maybe using nvm to have CI pick proper versions instead of relying on apt.wikimedia.org

As per T138561: Updates various services to nodejs 4.4.6, the plan is to switch to the new LTS (4.4.6) next week, so simply fixing the version assert is the way to go. As we want a uniform environment, testing repos against the new version even before the actual switch is definitely a good idea, so I'd be in favour of staying with pulling the packages from apt.wm.o instead of using nvm or the like.

One of our jenkins jobs continues to fail ... https://integration.wikimedia.org/ci/job/parsoidsvc-hhvm-parsertests-jessie/237/console ... Can this be fixed as well? It is getting in the way of merging patches.

As per T138561: Updates various services to nodejs 4.4.6, the plan is to switch to the new LTS (4.4.6) next week, so simply fixing the version assert is the way to go. As we want a uniform environment, testing repos against the new version even before the actual switch is definitely a good idea, so I'd be in favour of staying with pulling the packages from apt.wm.o instead of using nvm or the like.

I think this makes sense.

At the same time, let's reduce the number of surprises for everyone. This bug was a surprise for us and the timing wasn't great (with other things). Since people (at least whoever updated the apt.wm.o package, and whoever they were working with) knew this was coming it could have been avoided. Not blaming, just figuring out how to change this to make it better.

Maybe next time an explicit step in the timeline being "upgrade CI to new version a week or so before we upgrade (all|more of) production hosts" (whatever, wordsmith/let's figure it out together).

Can we do that?

One of our jenkins jobs continues to fail ... https://integration.wikimedia.org/ci/job/parsoidsvc-hhvm-parsertests-jessie/237/console ... Can this be fixed as well? It is getting in the way of merging patches.

Done.

Sorry for missing that one.

One of our jenkins jobs continues to fail ... https://integration.wikimedia.org/ci/job/parsoidsvc-hhvm-parsertests-jessie/237/console ... Can this be fixed as well? It is getting in the way of merging patches.

Done.

Sorry for missing that one.

Thank you!

At the same time, let's reduce the number of surprises for everyone. This bug was a surprise for us and the timing wasn't great (with other things). Since people (at least whoever updated the apt.wm.o package, and whoever they were working with) knew this was coming it could have been avoided. Not blaming, just figuring out how to change this to make it better.

You are right: there is a bug in process. Uploading new important packages to our repo should be properly communicated ahead of time.

In hindsight, fixing the node version to 4.3 when we are pulling it from apt.wm.o was probably a bug too. Since it is pulled from there, should we remove the version check altogether? It seems to be superfluous if we agree not to use nvm and friends, doesn't it?

Maybe next time an explicit step in the timeline being "upgrade CI to new version a week or so before we upgrade (all|more of) production hosts" (whatever, wordsmith/let's figure it out together).

Can we do that?

Yes. Concretely, nodejs upgrades do not happen often, and when they do, we know a new LTS package is coming usually a week in advance. In the future let's coordinate on this as soon as we know/decide an upgrade is needed.

For Node, major versions are the top level numbers. There shouldn't be a need to stick to minor versions.

Should we rename the jobs to *-node-v4?

This has been worked around. Resolving for now.

@Krinkle yeh I think we should rename them to v4 instead of it being v4.3.

@Paladox Yeah, let's track that separately though (or simply without a task).