Page MenuHomePhabricator

[Spike] Consider using imposm3 as the OSM replication system
Closed, ResolvedPublic

Description

Background information

OSM replication is failing repeatedly for a while and introducing buggy behavior regarding polygon rendering.

The OSM replication system relies on osm2pgsql, which is a widely and robust tool used by a big community for OSM data load. Despite that, osm2pgsql doesn't provide enough error logging support, and tracking the current issues are hard and sometimes inconclusive.

Another option is to use imposm3, another powerful tool that is also widely used by the OSM community. This spike task is an opportunity to identify if imposm3 features can meet production standards for our Maps infrastructure.

Hypothesis

Imposm3 can offer better support for all maps infrastructure and can replace osm2pgsql as the OSM replication engine.

Questions we want to answer

  1. Imposm3 have a better logging output to help with maintenance work?
  2. Imposm3 can handle OSM replication as fast as osm2pgsql?
  3. Imposm3 can be deployed in our infrastructure?
  4. Can we move to imposm3 with minimum changes in Postgres changes?
  5. Will it fixes the issues with OSM replication?

How will we go about answering the questions

  • Investigate schema changes and make sure there are no style changes during tile rendering.
  • Deploy do beta cluster and test full planet OSM replication
  • Depool one machine in codfw (less traffic) and test the new changes in the production environment

Results

The investigation reached the conclusion that it's possible to migrate to imposm3 without having big changes, if any, on DB schemas and style. To accomplish that was necessary a proper imposm3 mapping and some changes on the vector tile queries, the work can be found in the following links:

To proceed with this migration though we need to follow-up on the following tasks:

Related Objects

Event Timeline

Could imposm3 also be solution to T156433? Here I read that all ... relations can be imported.

Could imposm3 also be solution to T156433? Here I read that all ... relations can be imported.

Possibly, I will consider it during the implementation.

Note that the openmaptiles project is rapidly improving, with the goal of generating tiles "on the fly" -- without the tile pregeneration step, and without mapnik. In other words, a vector tile (MVT) is generated by a single giant PostgreSQL query, and send to the user on request (with some caching to speed up frequently-viewed regions). Adapting this approach will greatly simplify current Wikipedia setup - no more Mapnik, no more Cassandra, easily scalable architecture (the more postgres replicas, the bigger the capacity).
P.S. And yes, OpenMapTiles is using Imposm3, together with a number of other good data sources like Natural Earth for low zooms.

Note that the openmaptiles project is rapidly improving, with the goal of generating tiles "on the fly" -- without the tile pregeneration step, and without mapnik. In other words, a vector tile (MVT) is generated by a single giant PostgreSQL query, and send to the user on request (with some caching to speed up frequently-viewed regions). Adapting this approach will greatly simplify current Wikipedia setup - no more Mapnik, no more Cassandra, easily scalable architecture (the more postgres replicas, the bigger the capacity).
P.S. And yes, OpenMapTiles is using Imposm3, together with a number of other good data sources like Natural Earth for low zooms.

What do you think about having an RFC for that? I haven't had an opportunity to experiment with openmaptiles yet, but I am watching its steps.

@MSantos I am all for WMF to start using the OMT project rather than our first implementation, but I am not sure how valuable it will be to write an RFC -- so far WMF has not been too eager to support a proper map serving efforts, relying mostly on semi-volunteer efforts of different enthusiasts to keep it around. Do you think writing RFC will help in changing that? Or will it be just another dusty page on Phabricator?

@Yurik I think that OMT might be a good candidate for a proper Maps server that could reduce maintenance burden, which reflects the current resources of the project.

Considering this, having a centralized place to discuss the pros and cons and elaborate some directions for maps would be nice, it could even be a Wiki page where we could discuss/draft a proposal for this change, which I think is also constructive. What do you think?

sure, sounds good, so how about this - if you create a page/ticket/... with some basic info and goals, I will add implementation details to it. Would that work?

@MSantos there will be an OpenMapTiles community sync up this Thursday (10:30a ET), let me know if you would like to join in - we will be discussing how to move OMT forward, and possibly accommodate for Wikipedia needs. Email me YuriAstrakhan@gmail.com with your email addr.