Page MenuHomePhabricator

Reimport OSM data on eqiad
Closed, ResolvedPublic

Description

As a fallout of T243609, we need to reset our Postgres DB to a sane state.

The process:

  • 1) disable tilerator
  • 2) depool maps1004
  • 3) isolate maps1004 from the rest of the cluster [MrG]
  • 4) delete postgres data, recreate the empty DB [MrG]
  • 5) run script to reimport data from OSM, needs to run from root [MrG]
osm-initial-import \
   -d 201019 \ latest dump
   -s  https://planet.openstreetmap.org/replication/hour/000/071/112.state.txt \ hourly replication
   -x webproxy.eqiad.wmnet:8080
  • 6) enable replication and check at least one cycle of OSM replication
  • 7) repool maps1004
  • 8) re-init all slaves (cookbook)
  • 9) enable tilerator

Event Timeline

Gehel triaged this task as High priority.Jun 26 2020, 8:02 AM

Change 608459 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] Temporarily disable tilerator

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608459

Change 608726 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] Enable replication in eqiad

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608726

Change 608729 had a related patch set uploaded (by Ryan Kemper; owner: Ryan Kemper):
[operations/puppet@production] Isolate eqiad master maps1004 from cluster

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608729

Just a few queries to get me clued in:

  • Is there advance notice required for these changes or a safe period within which to do it?
  • For deleting the postgres data, is it a matter of just doing a dropdb?
  • I assume the cookbook required is sre.postgresql.postgres-init?

While I'm here - are there any plans to move maps servers to buster? I know it would further slow this rollout but if the whole DC is out of commission at the moment now could be a good time to attempt moving so as to save further disruption down the road?

I know it would further slow this rollout but if the whole DC is out of commission at the moment now could be a good time to attempt moving so as to save further disruption down the road?

In order to the upgrade to buster, here are a few things to take in consideration:

  • Tile regeneration is the pain point of this work, see last report: https://www.mediawiki.org/wiki/Wikimedia_Maps/Tile_generation_report
  • We need a spike to make sure mapnik (and the whole stack) will work okay with buster, last time we had to backport mapnik and pin old versions of some node packages to make it possible
  • I would prefer to fix OSM in eqiad straight away and put the efforts on k8s migration now that we have capacity for it in the Product Infrastructure team

Just a few queries to get me clued in:

  • Is there advance notice required for these changes or a safe period within which to do it?

This should have no impact on users (reimaging the primary first, re-syncing the replicas one after the other). So no advanced notice should be required and no specific communication.

  • For deleting the postgres data, is it a matter of just doing a dropdb?

Or even rm -rf /srv/postgresql/9.6/main/, puppet should recreate everything.

  • I assume the cookbook required is sre.postgresql.postgres-init?

This is actually badly named, this cookbook only re-init the replicas, not the primary. Also, this cookbook has only been used a few times, a long time ago. It *probably* works. The re-import procedure is documented on wiki.

While I'm here - are there any plans to move maps servers to buster? I know it would further slow this rollout but if the whole DC is out of commission at the moment now could be a good time to attempt moving so as to save further disruption down the road?

There are no plans, except that we know that this is something that must be done at some point. Migration to Buster is likely to be non trivial as some dependencies (mapnik) are not available on Buster.

Change 636909 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] maps: reenable eqiad OSM replication

https://gerrit.wikimedia.org/r/636909

Change 636909 abandoned by Hnowlan:
[operations/puppet@production] maps: reenable eqiad OSM replication

Reason:

https://gerrit.wikimedia.org/r/636909

Change 608459 merged by Hnowlan:
[operations/puppet@production] Temporarily disable tilerator in eqiad

https://gerrit.wikimedia.org/r/608459

Change 608729 merged by Hnowlan:
[operations/puppet@production] Isolate eqiad master maps1004 from cluster

https://gerrit.wikimedia.org/r/608729

Change 608726 merged by Hnowlan:
[operations/puppet@production] Enable replication in eqiad

https://gerrit.wikimedia.org/r/608726

Change 639608 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] tilerator: enable in eqiad

https://gerrit.wikimedia.org/r/639608

OSM import complete, all replicas up to date. Tilerator is ready to be reenabled

Change 639608 merged by Hnowlan:
[operations/puppet@production] tilerator: enable in eqiad

https://gerrit.wikimedia.org/r/639608

This is theoretically done. I'll keep an eye on this and resolve tomorrow if all is okay

@hnowlan Sorry if this is the wrong place to ask, but do you know why lower zoom levels haven't been updated yet? On January 1st, 2020, a lot of counties and municipalities in Norway merged. At zoom levels 10 and higher, they appear correctly:

Skjermdump fra 2020-11-23 16-04-15.png (594×490 px, 288 KB)

But at levels 9 and lower, the old borders are still visible:
Skjermdump fra 2020-11-23 16-02-39.png (573×543 px, 369 KB)

Any idea as to why?