Since this task was created, we've been experiencing an OSM database degradation happening over time.
At first, we had some issues with osm2pgsql failing to complete a daily OSM sync due to corrupted input data but we couldn't correlate the problems due to the lack of information on the logs.
We now have imposm3 as the ETL engine behind OSM DB and the problem continues to happen and we still don't know the cause.
What we know:
- it's unrelated to specific OSM edits, some missing polygons hasn't received an edit for months and still disappear
- it's unrelated to the tool that imports the database
- it's unrelated to initial imports and we have written some unit tests to verify that
How to mitigate that
While we don't have a solution for the root cause we have two maintenance procedures to restore the DB state:
These procedures helped to fix some issues raised by the community but it has become a burden to keep up with the degradation rate, that increased since we introduced Tegola
The current hypothesis to explore
There have been some PostgreSQL errors caused by connection slots starvation recently. The hypothesis is that, if we can correlate the OSM sync failure with the PG errors, we can find the root cause and act promptly to fix it.
Therefore we will be monitoring the database after every OSM change file is applied to it and look for the causes of the DB degradation. (upcoming patch)
What to do meanwhile?
For now, we will keep restoring the DB with the maintenance scripts, and to make it easier for us to keep up with the bugs, we will suggest a Phabricator template task to raise these types of issues with the needed information to apply the fix for specific geometries.
We have a scheduled maintenance script to re-import the planet in order to fix the current open tasks in the next week
Old task description kept for history context
Recently the community have spotted some weird behavior with geoshapes service. Some Wikidata items are not available through the service and some are, see example below:
- Works fine: https://maps.wikimedia.org/geoshape?getgeojson=1&ids=Q3145754
- Returns empty array: https://maps.wikimedia.org/geoshape?getgeojson=1&ids=Q6414
It seemed that something was wrong with the OSM replication script or the initial import script failed at some point.
Initial import script and Steps to reproduce
The log doesn't show any errors or warnings and they are available at /home/mbsantos/osm-initial-import.log
OSM replication script
The logs doesn't show any errors or warnings and they are available at /var/log/osmosis
Further investigation and steps to reproduce the actual results
With no related track from the logs I started to investigate the data stored in the PostgreSQL DB. I started with the relation Lake Garda (8569) from the example above. The problem is not with the relation, because it is stored in the table planet_osm_rels:gis=> SELECT id FROM planet_osm_rels WHERE id = 8569; id ------ 8569 (1 row)
But performing a https://github.com/kartotherian/geoshapes/blob/master/geoshapes.js#L64 geoshapes similar query, it returns nothing because the OSM relation didn't become a polygon at osm_planet_polygon:gis=> SELECT tags->'wikidata' AS id, osm_id gis-> FROM planet_osm_polygon gis-> WHERE tags ? 'wikidata' AND tags->'wikidata' IN ('Q6414'); id | osm_id ----+-------- (0 rows)
Request existent Wikidata-OSM link and got the GeoJSON through geoshapes service. See the same SQL queries for the Hôtel de Blossac (311766):gis=> SELECT id FROM planet_osm_rels WHERE id = 3117766; id --------- 3117766 (1 row)gis=> SELECT tags->'wikidata' AS id, osm_id FROM planet_osm_polygon WHERE tags ? 'wikidata' AND tags->'wikidata' IN ('Q3145754'); id | osm_id ----------+--------- Q3145754 | -311766 (1 row)
- All master machines:
Testing Environment for QA
- Beta Cluster environment will be set to explore the issue
Some references report a similar issue but the outcome doesn't apply to our infrastructure