Page MenuHomePhabricator

Increase frequency of OSM replication
Open, HighPublic

Description

Currently, the importer causes a long spike once per day, I propose that we change it to several shorter spikes. Hourly seems like a good compromise. Or maybe 15 minutes?

Details

Related Gerrit Patches:

Event Timeline

MaxSem created this task.Jun 16 2016, 1:27 AM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 16 2016, 1:27 AM
fgiunchedi triaged this task as Normal priority.Jul 15 2016, 11:10 AM
Yurik moved this task from Backlog to To-do on the Maps-Sprint board.Sep 14 2016, 8:54 PM
Restricted Application added a project: Discovery. · View Herald TranscriptSep 22 2016, 11:55 AM

Change 312241 had a related patch set uploaded (by Gehel):
maps - increase osm replication frequency to hourly

https://gerrit.wikimedia.org/r/312241

Change 312241 merged by Gehel:
maps - increase osm replication frequency to hourly

https://gerrit.wikimedia.org/r/312241

Gehel moved this task from To-do to In progress on the Maps-Sprint board.Oct 12 2016, 6:18 PM
Gehel added a comment.Nov 8 2016, 11:27 AM

Replication frequency is set to 1 hour on the maps-test cluster. We can see that the server load average and IO peaks every hour and barely has time to go back down before the next replication. We can also see that postgresql replication often lags by > 10 minutes. I have no idea what the cause of that is at the moment, but it looks like something that needs to be fixed before we enable that on production servers.

Gehel moved this task from In progress to Stalled/Waiting on the Maps-Sprint board.Nov 9 2016, 7:23 PM
Gehel added a comment.Nov 9 2016, 7:25 PM

Better metrics / dashboard is required to have visibility on what is happening.

Tilerator notification is failing regularly on the maps-test cluster, which it the cluster where hourly updates are enabled. This is correlation, not causality, still, we should make sure the problem isn't related (my suspicion: it is actually related).

Yurik moved this task from All map-related tasks to Tilerator on the Maps board.Dec 9 2016, 5:30 AM
Yurik edited projects, added Maps (Tilerator); removed Maps.
ksmith moved this task from Stalled/Waiting to To-do on the Maps-Sprint board.Jan 20 2017, 6:39 PM

Based on T159631: Tasmania is covered with water at z10+ we should switch to hourly diffs even if we don't change how often we update.

Gehel moved this task from To-do to Backlog on the Maps-Sprint board.Jun 6 2017, 7:23 PM
debt moved this task from Backlog to To-do on the Maps-Sprint board.Jun 6 2017, 7:44 PM
debt added a subscriber: debt.

Moving to prioritized as it's on our list of things that do need doing.

mxn added a subscriber: mxn.May 16 2018, 4:46 AM
MaxSem removed a subscriber: MaxSem.Jul 3 2018, 8:01 PM

Still something we're interested in doing, but not sufficiently high-priority for Maps-Sprint.

Izno added a subscriber: Izno.Aug 12 2018, 5:00 AM
MusikAnimal added a subscriber: MusikAnimal.
Mholloway raised the priority of this task from Normal to High.Aug 14 2018, 11:11 PM
Mholloway added a project: Maps-Sprint.
Mholloway added a comment.EditedAug 15 2018, 7:45 PM

Replication frequency is set to 1 hour on the maps-test cluster. We can see that the server load average and IO peaks every hour and barely has time to go back down before the next replication. We can also see that postgresql replication often lags by > 10 minutes. I have no idea what the cause of that is at the moment, but it looks like something that needs to be fixed before we enable that on production servers.

Tilerator notification is failing regularly on the maps-test cluster, which it the cluster where hourly updates are enabled. This is correlation, not causality, still, we should make sure the problem isn't related (my suspicion: it is actually related).

@Gehel Are both of these still the case? (I found a Grafana dashboard for the production cluster[1], but not the maps-test cluster.)

[1] https://grafana.wikimedia.org/dashboard/db/maps-performances

Edit: Of course I found the control to switch to maps-test two seconds after posting...