Page MenuHomePhabricator

Upgrade kartotherian and tilerator to nodejs 6.11
Closed, ResolvedPublic

Description

See parent task for details.

Event Timeline

This is blocked right now, as we don't have a test cluster in which to test this out on right now (but we have documentation to look at—we think :) ).

The test servers are done re-imaging, so we can now upgrade to 6.11 on them and test

Mentioned in SAL (#wikimedia-operations) [2017-09-19T18:52:30Z] <gehel> upgrading nodejs to 6.11 on maps-test2004 for testing - T171707

kartotherian logs on maps-test2004 (tail-kartotherian) after the nodejs 6.11 upgrade show errors connecting to postgresql. This might be unrelated to the upgrade, but needs to be investigated.

The last error is

{"name":"kartotherian","hostname":"maps-test2004","pid":124,"level":50,"levelPath":"error","msg":"geoshapes support failed to load, skipping: error: password authentication failed for user \"kartotherian\"\nerror: password authentication failed for user \"kartotherian\"\n    at Connection.parseE (/srv/deployment/kartotherian/deploy-cache/revs/9401f380832e910bb085bf3ccde4fcdd32598149/node_modules/pg/lib/connection.js:539:11)\n    at Connection.parseMessage (/srv/deployment/kartotherian/deploy-cache/revs/9401f380832e910bb085bf3ccde4fcdd32598149/node_modules/pg/lib/connection.js:366:17)\n    at Socket.<anonymous> (/srv/deployment/kartotherian/deploy-cache/revs/9401f380832e910bb085bf3ccde4fcdd32598149/node_modules/pg/lib/connection.js:105:22)\n    at emitOne (events.js:96:13)\n    at Socket.emit (events.js:188:7)\n    at readableAddChunk (_stream_readable.js:176:18)\n    at Socket.Readable.push (_stream_readable.js:134:10)\n    at TCP.onread (net.js:548:20)","time":"2017-07-04T14:48:50.673Z","v":0}

This error is from two months ago, and it starts fine with only the messages

"name":"kartotherian","hostname":"maps-test2004","pid":4,"level":40,"levelPath":"warn/service-runner","msg":"Startup finished","time":"2017-09-19T19:36:51.101Z","v":0}
{"name":"kartotherian","hostname":"maps-test2004","pid":4,"level":40,"levelPath":"warn/service-runner","msg":"ServiceRunner.run() is deprecated, and will be removed in v3.x.","time":"2017-09-19T19:36:51.108Z","v":0}

Everything tests okay to me. I checked serving tiles on 6533 and regenerating tiles through tileratorui.

Even though everything tests okay, I see an error in /srv/log/tilerator/main.log

{"name":"tilerator","hostname":"maps-test2004","pid":34,"level":50,"err":{"message":"Unable to load source \"gen\"\nSource \"genall\" is disabled, possibly due to loading errors","name":"tilerator","stack":"tilerator: Unable to load source \"gen\"\nSource \"genall\" is disabled, possibly due to loading errors\n    at Sources.getSourceById (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:338:15)\n    at Sources._getSourceUri (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:357:10)\n    at Sources._resolveValue (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:310:25)\n    at _.each (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:165:37)\n    at Function._.each._.forEach (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/underscore/underscore.js:158:9)\n    at Promise.try (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:164:15)\n    at tryCatcher (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/util.js:16:23)\n    at Function.Promise.attempt.Promise.try (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/method.js:39:29)\n    at Sources._loadSourceAsync (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:127:23)\n    at Promise.each.key (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:93:25)\n    at tryCatcher (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/util.js:16:23)\n    at Object.gotValue (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/reduce.js:155:18)\n    at Object.gotAccum (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/reduce.js:144:25)\n    at Object.tryCatcher (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/util.js:16:23)\n    at Promise._settlePromiseFromHandler (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/promise.js:512:31)\n    at Promise._settlePromise (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/promise.js:569:18)","levelPath":"error"},"msg":"Unable to load source \"gen\"\nSource \"genall\" is disabled, possibly due to loading errors","time":"2017-09-19T19:50:34.613Z","v":0}

Reformatted, the backtrace is

tilerator: Unable to load source "gen"
Source "genall" is disabled, possibly due to loading errors
    at Sources.getSourceById (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:338:15)
    at Sources._getSourceUri (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:357:10)
    at Sources._resolveValue (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:310:25)
    at _.each (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:165:37)
    at Function._.each._.forEach (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/underscore/underscore.js:158:9)
    at Promise.try (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:164:15)
    at tryCatcher (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/util.js:16:23)
    at Function.Promise.attempt.Promise.try (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/method.js:39:29)
    at Sources._loadSourceAsync (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:127:23)
    at Promise.each.key (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/@kartotherian/core/lib/sources.js:93:25)
    at tryCatcher (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/util.js:16:23)
    at Object.gotValue (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/reduce.js:155:18)
    at Object.gotAccum (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/reduce.js:144:25)
    at Object.tryCatcher (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/srv/deployment/tilerator/deploy-cache/revs/001811e1a3eb21cf9246c5425f27f001b91efd27/node_modules/bluebird/js/release/promise.js:569:18)","levelPath":"error"},"msg":"Unable to load source "gen"

@Pnorman which sources config file are you using?

@Pnorman which sources config file are you using?

I'm not using any directly - this is from whatever is running.

Looking as ps, the relevant command seems to be /usr/bin/nodejs src/server.js -c /etc/tilerator/config.yaml
I checked on maps-test2002 and it's not showing the same errors.

So it seems the sources & variables file specified in the /etc/tilerator/config.yaml has incorrectly specifying the username/password, most likely for postgres db. I remember @Gehel was doing some cleanup to get various test and prod boxes in sync for that - double check with him.

So it seems the sources & variables file specified in the /etc/tilerator/config.yaml has incorrectly specifying the username/password, most likely for postgres db. I remember @Gehel was doing some cleanup to get various test and prod boxes in sync for that - double check with him.

I can connect to postgres and cassandra using the details in that file.

@Gehel, can you make maps-test2004 the same as one of the other boxes? That way we can see if its nodejs 6.11 or something specific to it.

@Pnorman: not sure what you mean by

@Gehel, can you make maps-test2004 the same as one of the other boxes? That way we can see if its nodejs 6.11 or something specific to it.

As far as I know, maps-test2004 is the same as the other maps-test* nodes, except for the nodejs 6.11 upgrade.

The sources file used is:

gehel@maps-test2004:~$ grep sources /etc/tilerator/config.yaml 
      sources: sources.prod2.yaml

Same on maps-test2003:

gehel@maps-test2003:~$ grep sources /etc/tilerator/config.yaml
      sources: sources.prod2.yaml

Side note: the tail-kartotherian / tail-tilerator commands provide some nice formatting to the logs (they support the same options as tail). Looking at the script (/usr/local/bin/tail-tilerator), they use bunyan or jq if you want to replicate the behaviour.

So it seems the sources & variables file specified in the /etc/tilerator/config.yaml has incorrectly specifying the username/password, most likely for postgres db. I remember @Gehel was doing some cleanup to get various test and prod boxes in sync for that - double check with him.

Yep, that cleanup is done, and the configuration is aligned on prod and test clusters. We should still move that sources configuration out of the tilerator / kartotherian repo and deploy it either with scap or with puppet (T138443 / T162240). But no time at the moment...

Mentioned in SAL (#wikimedia-operations) [2017-09-20T12:28:20Z] <gehel> upgrading nodejs to 6.11 on maps-test2003 for testing - T171707

We were unable to reproduce the errors in the tilerator log on 2004, and 2003 worked without them, so I think 6.11 is good to go.

@Gehel found "ERROR: FATAL: remaining connection slots are reserved for non-replication superuser connections" in the postgres logs for maps-test, which makes me wonder if when it restarted it started back up and tried to connect before releasing all its old connections.

Mentioned in SAL (#wikimedia-operations) [2017-09-21T13:45:40Z] <gehel> upgrade to nodejs 6.11 on the full maps-test cluster - T171707

Mentioned in SAL (#wikimedia-operations) [2017-09-21T19:16:27Z] <gehel> upgrade to nodejs 6.11 on maps servers (including restart of tilerator / kartotherian) - T171707

Nodejs 6.11 is deployed on all maps servers! Services have been restarted.

The minor cleanup of updating the package.json file is in progress, but is tracked on github, so we could already close this ticket.