Page MenuHomePhabricator

Maps master servers running out of space
Open, HighPublic

Description

Starting from December 24, 2019, disk space has been increasing on maps servers. The most space increase happens on postgresql. There is a strong temporal correlation with the increase of OSM replication frequency.

As a stop gap measure, replication is paused, so no new writes should happen on those servers, which will give us some time to understand the issue. The impact is that no new tiles will be generated and we're going to be out of sync with OSM.

Event Timeline

Gehel created this task.Jan 24 2020, 4:09 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 24 2020, 4:09 PM
Mholloway triaged this task as High priority.Feb 11 2020, 4:54 PM

I took a look at the sizes of bzipped OSM planet data files over time, and it turns out that they're growing considerably year-over-year. It looks like our increasing storage needs aren't solely an issue of DB management.

Datesize (.bz2)
2020-01-0684G
2019-01-0773G
2018-01-0163G
2017-01-0254G
2016-01-0446G
2015-01-0539G

https://planet.openstreetmap.org/planet/

Mholloway added a comment.EditedWed, Mar 4, 5:28 PM

It looks like we reach 85% disk space utilization on the maps masters after a fresh import of the planet, even before kicking off any change replication. IIUC, a general hardware update is in the works, which will address the storage issue (among others).

I think this is in Needs Analysis on the Product-Infrastructure-Team-Backlog because we want to better understand the effect of increasing the replication frequency on disk space usage. That analysis should probably happen on T137939, and probably can't really happen at all until we have more storage to play with. In the meantime, this task can live in the Backlog.

T196474: Externalize tile storage for maps is something to consider once again in this context.

TheDJ added a subscriber: TheDJ.Tue, Mar 31, 1:24 PM

wait, so replication has been disabled for 2,5 months ?

@TheDJ, yes. Unfortunately, this problem fix was delayed by a variety of events that reduced the availability of maps staff for the past quarter. We hope to push this forward in the upcoming weeks now that we are being able to make room for that work.

I'm sorry for the inconvenience, I'll keep you posted.

Jhernandez removed a subscriber: Jhernandez.Thu, Apr 2, 6:46 PM