Page MenuHomePhabricator

Maps2004 ran into disk space issues again after reimaging with new partitioning scheme
Closed, ResolvedPublic

Description

Investigate why this happened and provide a viable solution for this as this problem is not seen in maps eqiad

Event Timeline

Mathew.onipe triaged this task as High priority.Jun 3 2019, 11:49 AM

Looking deep into maps2004 postgres database, the problem was traced to gis.planet_osm_line table being larger than normal when compared with its eqiad(maps1004) version.
We are still investigating why this is so.

maps2004 kept track osm2pgsql script for the last 5 days, all of them ended with failures during replication due to disk space. It seems that at some point osm2pgsql failed to roll back and didn't clean up some records causing some tables to grow indefinitely. Even though, the logs are inconclusive.

Next steps:

  • fix DB with data available in eqiad, see https://wikitech.wikimedia.org/wiki/Maps#Postgres or recreate OSM database with the initial import script
  • regenerate tiles from last osm2pgsql attempt, the list is available at /srv/osm_expire/expire.list.201905282106

Change 514090 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps: disable replication cron

https://gerrit.wikimedia.org/r/514090

Change 514090 merged by Gehel:
[operations/puppet@production] maps: disable replication cron

https://gerrit.wikimedia.org/r/514090

Mathew.onipe closed this task as Resolved.Aug 14 2019, 2:48 PM

This was traced to some initial problems during osm-initial-script. This was resolved by reinitializing osm again.