Page MenuHomePhabricator

Tune thread for osm2pgsql / postgres max connections for Maps
Open, LowPublic

Description

Discussion with @Pnorman: it seems that a good starting point for number of threads to use in osm2pgsql is nb CPU/2. The number of connections to postgres is nb threads x nb tables, which will need to be adapted from our current 120 max connections, taking into account the tilerator traffic.

Event Timeline

I suspect that Tilerator will have one connection per worker. Eventually, I would also like to have Kartotherian to use Postgres directly to get some data, so that number will tripple ( tilerator's cpucount/2 + kartotherian's cpucount).

I would expect the number of threads and the number of worker to have no direct relation to each other. Especially in node where IO should be async...

I had a quick look in the code and it seems that we are using pg.js, that seems to have an embbeded connection pool (node.js is really not my cup of tea yet). I'm not entirely sure how it does (or does not) make sense to pool DB connections.

We need measures... as always...

For import I generally recommend osm2pgsql uses num CPU threads on machines with up to 8 threads, unless there's something else running at the same time. Past 8 threads there's little data available. If you have enough RAM and are doing --slim import without --drop, most of the time is spent on building a large index, which can't be parallized.

For updates, it's a tradeoff for update speed vs load on the system. Lots of people run single threaded to keep the load down, or with just 2 threads.

Mholloway lowered the priority of this task from Medium to Low.Jul 31 2018, 4:35 PM

Change 293320 abandoned by Gehel:
WIP - Tune thread for osm2pgsql / postgres max connections for Maps

https://gerrit.wikimedia.org/r/293320