Currently, the importer causes a long spike once per day, I propose that we change it to several shorter spikes. Hourly seems like a good compromise. Or maybe 15 minutes?
|operations/puppet : production||maps - increase osm replication frequency to hourly|
|Open||Gehel||T137939 Increase frequency of OSM replication|
|Resolved||Gehel||T147194 reimage maps-test* servers|
|Resolved||Gehel||T148031 Maps - error when doing initial tiles generation: "Error: could not create converter for SQL_ASCII""|
|Resolved||Gehel||T148114 Maps-test was created with incorrect initial encoding|
|Resolved||MaxSem||T145534 maps - tilerator notification seems stuck on sorting files|
- Mentioned In
- T205735: Investigate jobs taking too long to complete in maps1001.eqiad
T201772: maps.wikimedia.org is showing old vandalized version of OSM
T159631: Tasmania is covered with water at z10+
T155601: Stabilizing Interactive Products
- Mentioned Here
- T159631: Tasmania is covered with water at z10+
Replication frequency is set to 1 hour on the maps-test cluster. We can see that the server load average and IO peaks every hour and barely has time to go back down before the next replication. We can also see that postgresql replication often lags by > 10 minutes. I have no idea what the cause of that is at the moment, but it looks like something that needs to be fixed before we enable that on production servers.
Tilerator notification is failing regularly on the maps-test cluster, which it the cluster where hourly updates are enabled. This is correlation, not causality, still, we should make sure the problem isn't related (my suspicion: it is actually related).
@Gehel Are both of these still the case? (I found a Grafana dashboard for the production cluster, but not the maps-test cluster.)
Edit: Of course I found the control to switch to maps-test two seconds after posting...