Page MenuHomePhabricator

[OSM] Backport imposm3 to the debian channel
Closed, ResolvedPublic2 Estimated Story Points

Description

We are moving to imposm3 to support OSM replication. Debian packages are stucked at imposm2.

Open questions

Should we install Imposm3 via released binaries that are available at Github or via golang dep manager?

See https://github.com/omniscale/imposm3#installation

Event Timeline

We probably want to rebuild and package as a debian package. @MoritzMuehlenhoff probably has an opinion and can help.

I don't see imposm2 packaged in Debian, which package would that be?

I don't see imposm2 packaged in Debian, which package would that be?

imposm seems to be imposm version 2

Ah, that explains, it was removed from Debian in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932607

I haven't looked at the version 3, but if it's written in Go, best to look into https://github.com/Debian/dh-make-golang

LGoto triaged this task as Medium priority.
MSantos renamed this task from Install imposm3 in Maps master to [OSM] Install imposm3 in Maps master.Sep 24 2020, 10:46 AM
MSantos moved this task from All map-related tasks to Production Infrastructure on the Maps board.

Ah, that explains, it was removed from Debian in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932607

I haven't looked at the version 3, but if it's written in Go, best to look into https://github.com/Debian/dh-make-golang

@MoritzMuehlenhoff I gave it a try but wasn't able to build the debian package, I'm using macos and tried through a docker build but doesn't have the full kownledge to do so. Can you help me to go on with this task? I'm planning to move this to the beta cluster in the next weeks to leverage the puppet code and necessary scripts to use this tool in production.

MSantos set the point value for this task to 2.Oct 6 2020, 11:07 AM
MSantos renamed this task from [OSM] Install imposm3 in Maps master to [OSM] Backport imposm3 to the debian channel.Nov 16 2020, 3:40 PM

@hnowlan this can be a good resource for this task https://github.com/omniscale/imposm3#binary

This won't work reliably, if at all, btw. If you download the binary releases, you will find out that aside from the go binary they also ship the following shared object files

  • libgeos.so
  • libgeos_c.so
  • libleveldb.so.1.22.0

shared library objects aren't suitable for shipping. They are linked against the libc (at the very least) which is not guaranteed to work with the libc versions (never mind all the other libraries, e.g. libsnappy or libstdc++ that I see in ldd for these) of the platform they are installed on. Plus.. there is no tracking of versions or their dependencies and that can lead a pretty nice dependency hell path.

There's also an additional option:Postgres 9.6 is also available on Buster (We already use it for cescout, which has a strict dependency on 9.6 since OONI upstream publishes there datasets that way).

From a quick glance Buster has all the deps (src:leveldb, src:geos, Golang 1.11) required to build imposm3, so we can create an imposm 3 deb on Buster, setup a Ganeti instance on Buster (maps-import1001) with Postgres 9.6 and add it to the Maps Postgres setup. Then the OSM import can simply happen from that separate instance (until we eventually also migrate maps at large).

There's also an additional option:Postgres 9.6 is also available on Buster (We already use it for cescout, which has a strict dependency on 9.6 since OONI upstream publishes there datasets that way).

From a quick glance Buster has all the deps (src:leveldb, src:geos, Golang 1.11) required to build imposm3, so we can create an imposm 3 deb on Buster, setup a Ganeti instance on Buster (maps-import1001) with Postgres 9.6 and add it to the Maps Postgres setup. Then the OSM import can simply happen from that separate instance (until we eventually also migrate maps at large).

I think there is also a dated cassandra requirement somewhere in there, so it might not be that easy. But that's almost hearsay, so @MSantos could you confirm ?

That being said, I worked a bit on packaging imposm3 yesterday. I am happy to report success: https://people.wikimedia.org/~akosiaris/

I started with stretch in mind (since maps is stretrch), but turns out that it can't be built on stretch, so it requires buster after all. That being said, if we can backport leveldb and libgeos we might be able to run it on stretch.

For those not familiar with the state of maps and stumbling upon this task, getting a feeling of exasperation is normal. Trying to adopt a not well maintained in the past infrastructure means that some tech debt needs to be paid. It's

There's also an additional option:Postgres 9.6 is also available on Buster (We already use it for cescout, which has a strict dependency on 9.6 since OONI upstream publishes there datasets that way).

From a quick glance Buster has all the deps (src:leveldb, src:geos, Golang 1.11) required to build imposm3, so we can create an imposm 3 deb on Buster, setup a Ganeti instance on Buster (maps-import1001) with Postgres 9.6 and add it to the Maps Postgres setup. Then the OSM import can simply happen from that separate instance (until we eventually also migrate maps at large).

I think there is also a dated cassandra requirement somewhere in there, so it might not be that easy. But that's almost hearsay, so @MSantos could you confirm ?

Actually, this hypothetical maps-import machine doesn't need to have a Cassandra node. The Cassandra requirement is storage for vector-tiles generated after the OSM data is synced.

I've attempted to document the current data flow in these diagrams. It should give a nice understanding of where the data lies in the infrastructure.

That being said, I worked a bit on packaging imposm3 yesterday. I am happy to report success: https://people.wikimedia.org/~akosiaris/

I started with stretch in mind (since maps is stretrch), but turns out that it can't be built on stretch, so it requires buster after all. That being said, if we can backport leveldb and libgeos we might be able to run it on stretch.

YAY! Well, I guess it's a matter of choosing the best option for you between the tradeoffs about backporting the dependencies or start a new machine for OSM DB master and sync scripts.

My 2 cents:
Backports strategy:
Pros:

  • It unblocks our work and we can test imposm migration and prepare it for deployment beginning of next quarter.
  • Less work to leverage the needed infrastructure

Cons:

  • More backported binaries to keep track of
  • Keeps the status quo, and we know this isn't great

maps-import strategy:
Pros:

  • Iteratively start to move towards debian buster
  • Isolate OSM sync scripts from the production infrastructure
  • Doesn't change the way we store OSM data because we already have the main instance doing the OSM sync and replicating the data through the cluster

Cons:

  • Needs more planning and changes the scope of current work
  • More work to leverage infrastructure

Also, the 2 strategies can be iterative steps of the same plan

There's also an additional option:Postgres 9.6 is also available on Buster (We already use it for cescout, which has a strict dependency on 9.6 since OONI upstream publishes there datasets that way).

From a quick glance Buster has all the deps (src:leveldb, src:geos, Golang 1.11) required to build imposm3, so we can create an imposm 3 deb on Buster, setup a Ganeti instance on Buster (maps-import1001) with Postgres 9.6 and add it to the Maps Postgres setup. Then the OSM import can simply happen from that separate instance (until we eventually also migrate maps at large).

I think there is also a dated cassandra requirement somewhere in there, so it might not be that easy. But that's almost hearsay, so @MSantos could you confirm ?

That being said, I worked a bit on packaging imposm3 yesterday. I am happy to report success: https://people.wikimedia.org/~akosiaris/

I started with stretch in mind (since maps is stretrch), but turns out that it can't be built on stretch, so it requires buster after all. That being said, if we can backport leveldb and libgeos we might be able to run it on stretch.

For those not familiar with the state of maps and stumbling upon this task, getting a feeling of exasperation is normal. Trying to adopt a not well maintained in the past infrastructure means that some tech debt needs to be paid. It's

Thanks @akosiaris
I also had some success first with FPM and after properly packaging imposm3 and 2-3 missing deps according to dh-make-golang (levigo, fsnotify). I can share them if its of any help.

There's also an additional option:Postgres 9.6 is also available on Buster (We already use it for cescout, which has a strict dependency on 9.6 since OONI upstream publishes there datasets that way).

From a quick glance Buster has all the deps (src:leveldb, src:geos, Golang 1.11) required to build imposm3, so we can create an imposm 3 deb on Buster, setup a Ganeti instance on Buster (maps-import1001) with Postgres 9.6 and add it to the Maps Postgres setup. Then the OSM import can simply happen from that separate instance (until we eventually also migrate maps at large).

I think there is also a dated cassandra requirement somewhere in there, so it might not be that easy. But that's almost hearsay, so @MSantos could you confirm ?

That being said, I worked a bit on packaging imposm3 yesterday. I am happy to report success: https://people.wikimedia.org/~akosiaris/

I started with stretch in mind (since maps is stretrch), but turns out that it can't be built on stretch, so it requires buster after all. That being said, if we can backport leveldb and libgeos we might be able to run it on stretch.

For those not familiar with the state of maps and stumbling upon this task, getting a feeling of exasperation is normal. Trying to adopt a not well maintained in the past infrastructure means that some tech debt needs to be paid. It's

Thanks @akosiaris
I also had some success first with FPM and after properly packaging imposm3 and 2-3 missing deps according to dh-make-golang (levigo, fsnotify). I can share them if its of any help.

I have vendored those as a shortcut for now in the package above (see debian/patches/govendor file), but if you feel like packaging them nicely and maintaining them that's a definitely better approach long term.

There's also an additional option:Postgres 9.6 is also available on Buster (We already use it for cescout, which has a strict dependency on 9.6 since OONI upstream publishes there datasets that way).

From a quick glance Buster has all the deps (src:leveldb, src:geos, Golang 1.11) required to build imposm3, so we can create an imposm 3 deb on Buster, setup a Ganeti instance on Buster (maps-import1001) with Postgres 9.6 and add it to the Maps Postgres setup. Then the OSM import can simply happen from that separate instance (until we eventually also migrate maps at large).

I think there is also a dated cassandra requirement somewhere in there, so it might not be that easy. But that's almost hearsay, so @MSantos could you confirm ?

Actually, this hypothetical maps-import machine doesn't need to have a Cassandra node. The Cassandra requirement is storage for vector-tiles generated after the OSM data is synced.

I've attempted to document the current data flow in these diagrams. It should give a nice understanding of where the data lies in the infrastructure.

That being said, I worked a bit on packaging imposm3 yesterday. I am happy to report success: https://people.wikimedia.org/~akosiaris/

I started with stretch in mind (since maps is stretrch), but turns out that it can't be built on stretch, so it requires buster after all. That being said, if we can backport leveldb and libgeos we might be able to run it on stretch.

YAY! Well, I guess it's a matter of choosing the best option for you between the tradeoffs about backporting the dependencies or start a new machine for OSM DB master and sync scripts.

My 2 cents:
Backports strategy:
Pros:

  • It unblocks our work and we can test imposm migration and prepare it for deployment beginning of next quarter.
  • Less work to leverage the needed infrastructure

Cons:

  • More backported binaries to keep track of
  • Keeps the status quo, and we know this isn't great

maps-import strategy:
Pros:

  • Iteratively start to move towards debian buster
  • Isolate OSM sync scripts from the production infrastructure
  • Doesn't change the way we store OSM data because we already have the main instance doing the OSM sync and replicating the data through the cluster

Cons:

  • Needs more planning and changes the scope of current work
  • More work to leverage infrastructure

Also, the 2 strategies can be iterative steps of the same plan

It's an interesting choice, I must say. Let's wait for @hnowlan to weigh in as well as he is heavily involved and figure out the best path forward for this.

I wrapped up (partially out of curiosity) the packaging of imposm3 with deb build dependencies, in case we prefer to avoid vendoring:

Build dependencies:

I've only tested it on debian buster.

It looks like the debian package was also able to be built on stretch if backports are enabled. I uploaded a debian package built for stretch here:
https://github.com/johngian/imposm3/releases/tag/debian-0.11.0

Relying on stretch-backports isn't much of an option, it can disappear any moment from the Debian archive and we're otherwise remove it from production, see https://phabricator.wikimedia.org/T256877

That said, it would still be an option to locally import the packages formerly present in stretch-backports in our local apt mirror.

Coming back to this now that I've been given the space for it:

My vote is for the approach of using a buster host for the import. I really like the idea of isolating the sync scripts from the rest of the infrastructure, and making motions towards Buster is always a good thing assuming it doesn't hugely complicate things further.

Despite what I said earlier, to complicate things further - does moving the script to buster itself necessitate moving the master at the same time? It seems like imposm will use a standard Postgres connect string and doesn't necessarily have to live on the same host.

If we must move the master (or if moving the master is the prudent thing to do), it's worth noting we are already required to move the masters away from the old maps?004 hosts already so some effort will be required on that either way.

Coming back to this now that I've been given the space for it:

My vote is for the approach of using a buster host for the import. I really like the idea of isolating the sync scripts from the rest of the infrastructure, and making motions towards Buster is always a good thing assuming it doesn't hugely complicate things further.

Despite what I said earlier, to complicate things further - does moving the script to buster itself necessitate moving the master at the same time? It seems like imposm will use a standard Postgres connect string and doesn't necessarily have to live on the same host.

If we must move the master (or if moving the master is the prudent thing to do), it's worth noting we are already required to move the masters away from the old maps?004 hosts already so some effort will be required on that either way.

I'm not opposed to doing it.

OS upgrade is only an issue with tile regeneration and if there is a Cassandra version mismatch between stretch and buster we don't have a known way to copy current data, so we need to go with full planet regeneration. Again, I'm not opposed since we know how to it better this time, but it will increase processing time until the data is ready for production traffic.

Despite what I said earlier, to complicate things further - does moving the script to buster itself necessitate moving the master at the same time? It seems like imposm will use a standard Postgres connect string and doesn't necessarily have to live on the same host.

We'll need two adapt the Ferm firewall rules, but other than that I think it should just work.

Change 656404 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] maps: reimage maps1009 with buster.

https://gerrit.wikimedia.org/r/656404

Change 656460 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] maps: make maps1009 a new, independent buster master.

https://gerrit.wikimedia.org/r/656460

Change 656404 merged by Hnowlan:
[operations/puppet@production] maps: reimage maps1009 with buster.

https://gerrit.wikimedia.org/r/656404

Change 656460 merged by Hnowlan:
[operations/puppet@production] maps: make maps1009 a new, independent buster master.

https://gerrit.wikimedia.org/r/656460

Script wmf-auto-reimage was launched by hnowlan on cumin1001.eqiad.wmnet for hosts:

['maps1009.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202101261443_hnowlan_19435.log.

Completed auto-reimage of hosts:

['maps1009.eqiad.wmnet']

Of which those FAILED:

['maps1009.eqiad.wmnet']

Change 658627 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] maps::apps: only use nodejs10 repo on stretch

https://gerrit.wikimedia.org/r/658627

Change 658627 merged by Hnowlan:
[operations/puppet@production] maps::apps: only use nodejs10 repo on stretch

https://gerrit.wikimedia.org/r/658627

Script wmf-auto-reimage was launched by hnowlan on cumin1001.eqiad.wmnet for hosts:

['maps1009.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202101261627_hnowlan_28505.log.

Completed auto-reimage of hosts:

['maps1009.eqiad.wmnet']

Of which those FAILED:

['maps1009.eqiad.wmnet']

Change 659950 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] profile::maps::tlsproxy: add_ecdhe_curve toggle

https://gerrit.wikimedia.org/r/659950

Change 659950 merged by Hnowlan:
[operations/puppet@production] profile::maps::tlsproxy: add_ecdhe_curve toggle

https://gerrit.wikimedia.org/r/659950

Change 666325 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] osm: correct file paths for imposm components

https://gerrit.wikimedia.org/r/666325

Change 666325 merged by Hnowlan:
[operations/puppet@production] osm: correct file paths for imposm components

https://gerrit.wikimedia.org/r/666325

Change 667153 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] osm:imposm: Make imposm updater proxy-aware

https://gerrit.wikimedia.org/r/667153

Change 667153 merged by Hnowlan:
[operations/puppet@production] osm:imposm: Make imposm updater proxy-aware

https://gerrit.wikimedia.org/r/667153