Page MenuHomePhabricator

Request increased quota for maps Cloud VPS project
Closed, ResolvedPublic

Description

Project Name: maps
Type of quota increase requested: disk
Reason: Setting up a user supported OSMdb mirror

Per T285668 the current WMF maintained OSMdb that is available to Cloud VPS projects is out of date. My queries show it is at best at the state from half a year ago, with possible history gaps before that.

In that ticket a DB hosted within the maps project was suggested. This ticket takes admins up on that proposal :-). I'd be happy to give this a shot and I'd try to find some accomplices on wiki.
I suspect that I'm the #1 user of this DB anyways! I'm creating tiles for the WikiMiniAtlas, both server-side rendered and JSON tiles with client-side rendering at high resolutions. If I had responsibility for the DB server I could make sure that my tools work with it properly (and I'd have to feel the pain if they cause issues - ...if that isn't a great motivator, I don't know).

So, here it goes...

I request a disk quota increase of 3TB for a volume to store a OSM database replica.

Attachement: on-demand rendered WMA tiles in the few weeks:

overview_11.gif (768×1 px, 379 KB)

Event Timeline

I just confirmed we have enough storage space in our ceph backend.

+1 to this request, if you use a cinder volume attached to the instance to store the DB data.

I'm documenting the setup here: https://github.com/Commonists/maps-osmdb

I'd like to get a head start (obviously I don't have the cinder volume attached yet, but that step should be trivial)

I have an up to date planet-latest.osm.pbf that I'm incrementally keeping current with osmupdate. I have the database set up (just have to move it to the cinder volume), and I think I got the replication using osmosis and osm2pgsql figured out. I'm itching to get an import going.

Done. Please follow up on this task or with other tickets if/when:

  1. This doesn't go anywhere and we can reclaim the resources, or
  2. This is a rousing success and we can stop storing/maintaining the existing database

btw, is this a postgres database? If so can/should it be managed by Trove rather than local to this project?

Uhm, yes, it is a Postgres / PostGIS database. I thought the whole point here was not to have a DB managed by admins, but to put it in "userspace" (per T285668).

Then again, if I can get a DB configured with a bunch of Postgres options (see https://github.com/Commonists/maps-osmdb) and a user with sufficient access rights, that could work, too.

Especially seeing that the import of the planet data is likely going to take in excess of 4 weeks. So if Trove would get me better performance I'm all for it. The replication process could still be running on an instance of the maps project. Giving us control over replication and an easy way to hook in replication postprocessing steps like tile expiry.

ok! so... Trove is a 'database as a service' option that provides some small provisioning and management of databases. So it would be self-serve but with some parts automated (specifically, creation of the database server and future upgrades)

Postgres support there is not perfect but you may find it easier to maintain than building a database from scratch.

Give the 'database' tab a try in Horizon. I predict that getting the db up and running for starters will be easy, and also that the user management features on horizon mostly won't work for postgres (you'll have to connect and do that via the CLI and muck with the default 'postgres' database). If you find that to be adequate then let's do this in Trove rather than having you manage the db all the way down to the VM -- I'll have to shift some quotas around.

Hm, ok. I'm in the middle of importing data, so I'd like to see what the performance of the Trove DB is compared to the server on the VM. Would it be possible to get enough quota to start an import? Also how hard/easy is it to resize volumes or Trove DBs in case the DB size grows beyond the current allocation future?

I've just done some resizing tests and it doesn't seem to work very well; the UI resizes the volume but not the filesystem on the volume. So resizing is possible but will require admin interention.

There's a default limit of 250Gb for trove volumes; if you tell me the name and size that you'd like (3Tb I'm guessing?) I can suspend the rules briefly to create a giant.

Yeah, I think 3TB would be a good start. I have trouble finding hard and fast numbers for database size. My current import is at 1.3TB with nodes imported, but only a fraction of all ways, and no relations imported.

I created the maps-postgres db instance -- let me know how it goes!

image.png (142×728 px, 14 KB)

weird, it shows mysql as the datastore ...

That's probably just me having bad mouse aim. I will try again!

Ok, thanks. The import on the VM did speed up considerably and is in its final stages. I'll let it run and check if it went well, to verify my procedure. If it went fine I'll replicate the steps on the Trove DB.

Just to keep you updated. The import took about 11days and finished yesterday. It is currently building indices and it shows 2.2TB out of 2.9TB used on the DB volume. I'll let this finish to get a better idea as to how big the final (at least with the current state of OSM) DB will be. It's possible that 3TB is a bit tight.

dschwen@maps-osmdb:~/osmdb$ time osm2pgsql -c -s planet-latest2.osm.pbf -d gis
2022-01-31 20:47:21  osm2pgsql version 1.6.0 (1.6.0)
2022-01-31 20:47:21  Database version: 13.5 (Debian 13.5-0+deb11u1)
2022-01-31 20:47:21  PostGIS version: 3.1
2022-01-31 20:47:21  Setting up table 'planet_osm_point'
2022-01-31 20:47:22  Setting up table 'planet_osm_line'
2022-01-31 20:47:22  Setting up table 'planet_osm_polygon'
2022-01-31 20:47:22  Setting up table 'planet_osm_roads'
2022-02-11 17:52:17  Reading input files done in 939895s (261h 4m 55s).
2022-02-11 17:52:17    Processed 7476676170 nodes in 31789s (8h 49m 49s) - 235k/s
2022-02-11 17:52:17    Processed 833672717 ways in 656592s (182h 23m 12s) - 1k/s
2022-02-11 17:52:17    Processed 9622337 relations in 251514s (69h 51m 54s) - 38/s
2022-02-11 17:52:18  Clustering table 'planet_osm_roads' by geometry...
2022-02-11 17:52:18  Clustering table 'planet_osm_line' by geometry...
2022-02-11 17:52:18  Clustering table 'planet_osm_point' by geometry...
2022-02-11 17:52:18  Clustering table 'planet_osm_polygon' by geometry...
2022-02-11 19:49:23  Creating geometry index on table 'planet_osm_roads'...
2022-02-11 20:05:13  Creating geometry index on table 'planet_osm_point'...
2022-02-11 20:06:37  Creating osm_id index on table 'planet_osm_roads'...
2022-02-11 20:11:56  Analyzing table 'planet_osm_roads'...
2022-02-11 20:13:46  Done postprocessing on table 'planet_osm_nodes' in 0s
2022-02-11 20:13:46  Building index on table 'planet_osm_ways'
2022-02-11 22:57:25  Creating osm_id index on table 'planet_osm_point'...
2022-02-12 00:04:34  Analyzing table 'planet_osm_point'...
2022-02-12 00:05:07  Building index on table 'planet_osm_rels'
2022-02-13 01:50:51  Creating geometry index on table 'planet_osm_line'...
2022-02-13 07:55:49  Creating osm_id index on table 'planet_osm_line'...
2022-02-13 09:16:08  Analyzing table 'planet_osm_line'...
2022-02-13 17:54:52  Creating geometry index on table 'planet_osm_polygon'...
2022-02-14 13:22:19  Creating osm_id index on table 'planet_osm_polygon'...
2022-02-14 15:23:16  Analyzing table 'planet_osm_polygon'...
2022-02-17 01:09:21  Done postprocessing on table 'planet_osm_ways' in 449734s (124h 55m 34s)
2022-02-17 01:09:21  Done postprocessing on table 'planet_osm_rels' in 10962s (3h 2m 42s)
2022-02-17 01:09:21  All postprocessing on table 'planet_osm_point' done in 22368s (6h 12m 48s).
2022-02-17 01:09:21  All postprocessing on table 'planet_osm_line' done in 141865s (39h 24m 25s).
2022-02-17 01:09:21  All postprocessing on table 'planet_osm_polygon' done in 250293s (69h 31m 33s).
2022-02-17 01:09:21  All postprocessing on table 'planet_osm_roads' done in 8487s (2h 21m 27s).
2022-02-17 01:09:21  osm2pgsql took 1398120s (388h 22m 0s) overall.

real    23301m59.583s
user    989m57.448s
sys     394m59.727s

Import is done! I'm a bit busy with work, but will try some queries and also try the incremental replication.

HOWEVER:

2.9T  2.8T  2.8G 100% /volume

The DB as is uses almost the entire 3TB! I'll check if I can vacuum off a bit...

Would it make this testing easier if we gave you some more quota space? I'm not sure how easy it will be to expand your already-in-progress VM but it should be at least possible

Yeah, for sure. II also wonder if it would be possible to just transplant config and data director from the VM PG to the Trove container?
I'm also a little suspicious about how tightly the data just happened to fit. I don't see any errors, but it would be quite a coincidence if the import worked with just a few hundred MB of free space left :-/

The maps-postgres db instance is still shown as building. Isn't that a bit curious. Also, I don't see a way to resize a volume. Is that possible via Horizon?

There is an 'extend' menu on the volumes tab.

Regarding the VM being in 'building' state forever... I noticed that too but don't have an explanation. I'll see what I can turn up when I back at work.

Thanks. The volume my current DB import is on is has now been in the "detaching" state for a while as well. Similar issue? I tried to detach it to resize it. But while it is in that state I cannot do anything but "update metadata".

Update: it is still detaching ~15h later

Still detaching. I'm stuck.

I don't know why this happened but I reset the state on the CLI and tried detaching again and now it seems like it's in a useful state.

Thanks, any idea why I cannot resize the volume to 4TB?

image.png (444×772 px, 59 KB)

The UI indicates that I have ~3TB quota left.

Hm, it seems I was simply confused by the display of remaining quota. In the project overview I see there is no volume quota left.

Mentioned in SAL (#wikimedia-cloud) [2022-02-22T22:10:34Z] <andrewbogott> raising project 'maps' quota by two tb -- T300160

You should have some breathing room now -- any better?

Yes! I resized the FS to 4TB and started a replication update. https://wma.wmcloud.org/osmdb.txt

Uaaahhrggh, I forgot to use hstore during import. Now I have an up to the minute updated DB that does not support my queries! Ok, I suppose this was a learning experience. I should try again on trove.

@Andrew The Trove instance is stuck on "Building" and I cannot seem to log into it. I need to set a bunch of postgres.conf options and I need to make sure that the postgis and hstore extensions are installed (is Trove managing the upgrades of those as well, otherwise the Trove advantage of having the server managed seems not applicable for this scenario).

Hello @dschwen . I'm on vacation this week so won't be very responsive.

As best I can tell there's some kind of fixed-size bug with Trove that causes these lock-ups for big databases. I can try to investigate that bug in the coming weeks but if you want to move ahead you're probably better off managing your own postgres. Sorry for the wild goose chase!

No problem, @Andrew! Enjoy your vacation! This is a slow burn project anyways. I'm currently reimporting the planet all over again, this time with the proper hstore usage. Of course I had to ignore the advice online about doing test imports with small subsets (like a US state or city)... :-/

@dschwen How are things going? Anything further we can help with?

Fine so far. I'm at my second import with optimized database settings and the right data layout (using hstore). It'll be. A few more days. You can monitor the progress here https://wma.wmcloud.org/osmdb.txt

@dschwen, can I get an update about how things are going? Do you have all the resources you need?

Hi @dschwen, wanted to check in on if there is anything else that you might need for this, or any resources you no longer need.

Hello @rook and @Andrew , I'm in good shape. I have an up to date OSM mirror (see https://wma.wmcloud.org/osmdb.txt) and my tile rendering is working great (https://wma.wmcloud.org/status.php). I could free 80GB by removing my older tile cache volume (that ended up being too small). Either way I should unmount it and detach it from the machine. Cheers and many thanks for your support!

Sounds great, thanks for the followup!

@dschwen, what is the story with the Maps project? I see you working on it actively in this ticket but T306080 implies that the project is dead (and the tileserver VM is in danger of being deleted in a week or two.) If you plan to keep this project alive, can you respond on T306080?

Hello! I have a few questions about this project:

  1. Is this project still active? (I'm assuming yes but it's always good to ask)
  2. Am I correct in understanding that all of the cinder volumes in this project represent caches and/or replicas of data stored elsewhere? I'm trying to cut down on our backup volume and would not like to backup things that can be reproduced at runtime in case of a disaster.

Thank you!

-Andrew