Page MenuHomePhabricator

Maps master servers running out of space
Closed, ResolvedPublic


Starting from December 24, 2019, disk space has been increasing on maps servers. The most space increase happens on postgresql. There is a strong temporal correlation with the increase of OSM replication frequency.

As a stop gap measure, replication is paused, so no new writes should happen on those servers, which will give us some time to understand the issue. The impact is that no new tiles will be generated and we're going to be out of sync with OSM.

Event Timeline

I took a look at the sizes of bzipped OSM planet data files over time, and it turns out that they're growing considerably year-over-year. It looks like our increasing storage needs aren't solely an issue of DB management.

Datesize (.bz2)

It looks like we reach 85% disk space utilization on the maps masters after a fresh import of the planet, even before kicking off any change replication. IIUC, a general hardware update is in the works, which will address the storage issue (among others).

I think this is in Needs Analysis on the Product-Infrastructure-Team-Backlog because we want to better understand the effect of increasing the replication frequency on disk space usage. That analysis should probably happen on T137939, and probably can't really happen at all until we have more storage to play with. In the meantime, this task can live in the Backlog.

wait, so replication has been disabled for 2,5 months ?

@TheDJ, yes. Unfortunately, this problem fix was delayed by a variety of events that reduced the availability of maps staff for the past quarter. We hope to push this forward in the upcoming weeks now that we are being able to make room for that work.

I'm sorry for the inconvenience, I'll keep you posted.

Any news? The downtime is quite long for such a useful and crucial functionality of Wikipedia.

Now its been 5 months since replication was disabled... any updates @MSantos @Mholloway @Gehel ?

Now its been 5 months since replication was disabled... any updates @MSantos @Mholloway @Gehel ?

Data has been reimported on our codfw datacenter, we are in the process of doing the same for the eqiad datacenter (T254014). We should be back to normal operations next week.

That's good to hear. So does normal operations imply that everything should be reimported by now? I have a test page at, still no luck with the import at least with this relation I've been continually monitoring.

@Em sorry for the confusion, I suggest that you track T254014: Reimport OSM data on eqiad in order to have more accurate feedback regarding this fix.

Reminder: we the German map-maniacs have been pleading with majority for better support for geospatial information in the latest Technical Wishlist 2020 survey ;)

Ping - any news here? How long will we have to wait fill this is fixed???

For what it’s worth, the relation I reported in has begun showing up in mapframes, so something has begun moving, though I don’t know if Wikimedia Maps is fully caught up.

Wikimedia Maps appears to be pretty much caught up now, based on some spot checks of recent edits I’ve made to OpenStreetMap. For example, added a covered bridge and creek that show up in