Blocking some releases... we don't want to put too much pressure on the main OSM tile servers.
Blocking some releases... we don't want to put too much pressure on the main OSM tile servers.
|Duplicate||Yurik||T27139 Integrate OpenStreetMap data within Wikimedia projects|
|Declined||None||T35856 Wikipedia Android App 2.0 release (tracking)|
|Open||None||T64257 OpenHistoricalMap & Wikimaps|
|Resolved||akosiaris||T35980 Wikimedia-hosted OpenStreetMap (OSM) / mapnik tileservers wanted for mobile usage|
|Invalid||None||T62831 Prepare dedicated hardware for tileserver|
|Resolved||Yurik||T60797 [DO NOT USE] [tracking] OSM on Labs [superseded by #Maps]|
|Resolved||coren||T50896 Please install Postgresql on Tool-labs|
|Resolved||akosiaris||T62819 Set up a tileserver for OSM in Labs|
|Resolved||akosiaris||T62461 Replicate OSM to a database server accessible by Labs users|
If getting a full tile server going is too slow, we may want to check whether a caching proxy the fetches from OSM's main servers would be acceptable, as that might be easier to furnish.
Do you mean hardware? Or space on existing servers? If I need to bug someone in Ops, please let me know. I imagine a similar issue exists for the caching proxy Brion suggested.
I don't think space on existing servers would work well. For the toolserver, we already have database replication and tile generation / serving on one server and it's strained somewhat. It's best they be two separate and not share.
In the short term, we can try setting up configurations and testing stuff on wikimedia labs perhaps.
Brandon Black is currently working on this; https://www.mediawiki.org/wiki/Mobile_web/Team/Etherpad/WMF_OSM_Hack_Session_2013 has a bunch of OLD thoughts on this from March 2013, and Brandon will be creating a central wiki page to track the requirements and progress. Thanks, Brandon.
I've uploaded my notes to a wikitech page to cover the ops project for this: https://wikitech.wikimedia.org/wiki/OSM_Tileserver
Probably the most relevant summary bit from that is a tentative initial date around mid-Auguest to get a test machine going with a workable single-machine software stack as a start point. Let's call it Monday Aug 19th just to be more specific. Once that's running and functional, we should have a much clearer idea of the real challenges and be able to make a better timeline for production deployment.
As Brandon investigated how to build this, he ran into some questions about scope and use cases. He got some answers this week and detailed them on https://wikitech.wikimedia.org/wiki/OSM_Tileserver and now will be able to make better progress. We don't have a date yet, though. Sorry for the delay.
OSM sysadmin team has the tile configuration in Chef:
Tilecache config too (very basic):
Disclosure: I am part of the tiny OpenStreetMap sysadmin team. Firefishy in MediaWiki-General on Freenode or #osm-dev on OFTC.net
Not limited to mobile usage:
The Wikivoyages are now using Mapquest/Cloudmade/OSM/etc for dynamic maps.
They would benefit greatly from a Wikimedia-hosted HTTPS tileserver.
Hardware: boots, using it to stage other work below, will wipe again once we have a final config Packaging: we can use upstream Ubuntu packages for many parts (e.g. basic mapnik packages) it seems. Using a local fork of Kai's packages for renderd/mod_tile + "stylesheet-data" (to get them into our local repo instead of over ppa, and modify defaults/deps as long as we're there to not pull in pgsql or download coastline stuff automatically). Render machines: Currently looking at how we'll manage puppetize/deploy of coastline data (~700MB of binary files)...
For coastlines you should check this:
It seems the easier way for me to use already generated coastlines.
Coastlines are really complex:
Other question: Will we have an hstore in the database? Otherwise we are not flexible enough to render the most important styles, like hikebike-style.
Generating coastline data "from source" is also a bit too complex for what we want to do here. As far as we're concerned, "source" is the upstream pre-generated stuff. Kai's openstreetmap-mapnik-stylesheet-data package (here: https://launchpad.net/~kakrueger/+archive/openstreetmap/+packages ) encapsulates (aside from basic style/symbol stuff) the coastline data conceptually, but obviously he doesn't package 688MB of binary data in the .deb. Instead, it includes a script which (by default) is run at package postinst time to download and unpack these (which is 5 compressed files from two different sites - tile.openstreetmap.org and www.naturalearthdata.com).
When I last talked this over with Faidon, we both agreed we didn't like the idea of downloading huge data from the public internet as part of an automated, puppet-driven package install to set up a render node in the general case. Ideally we'd have some other fancy local solution to store this data (and update it once in a blue moon as necessary), and we'd rsync it (or whatever) within our infrastructure. Kai's package allows for this already (via a debconf option to skip the download).
I'm still pondering this bit. Part of me really just wants to say, "Look, any other solution is even more of a pain in the ass, let's just set up an outbound HTTP proxy on these machines at install time [by default, they don't have access to the outside world, but we have proxies avail...] and let them pull it from the primary upstream sources via the package default." Then we can move on with other challenges, and if it really bothers some enterprising individual down the road when we have more renderer machines to care about, they're free to remedy the situation at that time.
hstore: I haven't reached a point where I'm fully aware of the tradeoffs on hstore. If it's relatively cheap to do hstore during osm2pgsql (and ongoing updates), we may as well put it in from the get-go. On the other hand, this first deployment isn't intended to support things like hikebike. We're just trying to get basic world map tiles out the door, and then based on production experience with that we can decide what else to support and what that will require in terms of hardware and software resources.
The coastline data is derived from two sources. 1) Natural Earth and 2) post-processed OSM data
The natural earth data is a static data set that is probably only updated every couple of years. In the osm stylesheets it, is only used for very low zoom renderings of the entire earth, where precission doesn't matter. I think using natural earth still stems from a time when OSM didn't quite trust its own source of coast lines and didn't want tiny mistakes in the coast line propagating into large issues on a global scale.
For the majority of the zoom levels the style sheets use osm coastline data that is post processed into shape files for better performance during rendering. It generates closed (and partly simplified) polygons out of the individual coastline segments. I think it takes a couple of hours to do the post-processing, as e.g. recreating a closed polygon out of the full resolution European/Asian coastline is a somewhat intense task. The toolset to create these coastline shapefiles are all open, and so it would be possible to build them locally. However, as far as I am aware no one has ever complained that the load from downloading the coastline files has been excessive. So I wouldn't worry for now if you pull those files from the upstream locations. Particularly not if you run it through a caching proxy (that caches such large files).
That said, the current main OSM style sheet toolchain has changed somewhat in the past month ( https://github.com/gravitystorm/openstreetmap-carto ). Instead of writing the unwieldy mapnik style sheet xml by hand, they have moved over to a CSS based map style language (carto) that is then compiled (by a node.js based compiler) into the actual mapnik style sheet xml. In this process the style sheet has also moved over to a slightly different set of coast line files, which uses a more efficient tool-chain to creating those shapefiles. Those files now live on http://data.openstreetmapdata.com/ and the script https://github.com/gravitystorm/openstreetmap-carto/blob/master/get-shapefiles.sh downloads all the relevant files.
I haven't yet updated my PPAs to use the new style-sheet, but I am hoping to do that soon. I'll try and do it sometime this week. But as I intend to put the pre-compiled mapnik xml style sheet into the packages rather than the carto source (to avoid the node.js dependencies) nothing much should change from a tile server admin's point of view.
Regarding the PPAs, the source debian scripts live in https://github.com/apmon/OSM-rendering-stack-deplou Those also include some debian packaging scripts I specifically created for wikipedia back in March, that take out some of the "fancy" post-install scripts that try and set things up. With puppet those install scripts wouldn't be necessary, and aren't directly "debian standards compliant". However, obviously feel free to modify them in any way you need. Or if there are issues let me know and I can hopefully try and fix anything as necessary.
You probably don't really want to use my PPAs directly anyway, as I have taken a policy to freeze the packages together with the respective ubuntu distributions. So the packages for 12.04LTS are more or less still from 1 1/2 years ago and quite a bit has changed in the software since then.
For osm2pgsql a more or less up-to-date version is in debian unstable, but hasn't migrated to ubuntu yet. With mod_tile / renderd I have always intended to get packages for them into the official debian / ubuntu repositories, but haven't so far. Perhaps with Faidon's (or someone elses) help with cleaning them up, we could have another attempt at getting them in.
Regarding hstore: I don't have any hard numbers of overhead at the moment, but I don't think it is a significant or relevant overhead.
Configuration wise, activating hstore is done with a single command line switch to osm2pgsql. I believe my PPA packages already install the hstore extension into postgresql automatically.
Performance wise during import / diff processing the overhead should be minimal as that is not really where the bottleneck is.
Sizewize I think the overhead is somewhere on the order of 100GB if I am not mistaken. However, given that the non hstore db is just over 256GB and the hstore one is below 512GB, chances are that overhead is not really much concern.
Furthermore there is some talk in moving the main osm style-sheet to (partly) use hstore as well. It would probably keep all the data in postgres columns for which there are where clauses on the rendering sql selection filteres and move the rest into hstore. So far this hasn't happened, as the performance impact during rendering isn't well known yet. But with pressure to include data in the map rendering for which the current schema doesn't have columns, I would think it is likely to happen at some point in the not too distant future.
So I would recommend to activate it just to be flexible and prepared for future changes without having to re-import everything again. From what I have seen, there are little downsides and several benefits. But as a fresh import wouldn't take all that long, it probably isn't directly critical either.
I think a reasonably solid date for a prod-accessible installation would be Nov 4th. From there we'll have to take our time adding various prod traffic sources and seeing how things scale.
4th November, which year?
Sorry, it is frustrating to hear from WMF one false prediction after another.
In March in Copenhagen WMF engineers told me that something should be ready 2 later, the same answer in May in Amsterdam, than in Hongkong... now we have November. I feel that WMF is wasting my time and it's hard for me to see a future for maps in Wikipedia.
Are professional project management and realistic communition really so difficult?
From an email by Brandon just now:
"Things are always a little more complicated than I expect them to be. It's one thing to build a one-off tileserver and make it work, it's another to structure it to be automated, deployable, and manageable on an infrastructure, and I've been distracted by other shorter-term tasks frequently. I still think I'm very close to being done with this. We have hardware up, it's booted/installed, we have packages that will work for our deployment style (with some compromise made). I'm restarting the initial database import (osm2pgsql of the full planet data) later today, and that tends to take quite a while to run (on the order of ~24h? I've made some changes since the last attempt that may affect it)."
Since it's evidently hard to provide estimates when it'll be done, regular updates are much appreciated.
Osm2pgsql does generally take a fairly long time to import and it is very hardware dependent. On powerful hardware (database on SSDs, sufficient ram and a reasonable multi-core CPU), 24 hours does sound a little on the long side though, and indicates that there may still be some room for optimising the setup. The fastest imports I have seen are on the order of 8 - 10 hours.
Are the parameters for postgresql and the command-line options of osm2pgsql available in one of the public puppet repositories somewhere?
why don't you involve Kai and Tim? They are very experienced and are offering their help. To me, it sounds like a very good plan to bring them in. What would have to done to do so?
Brandon has an update here: https://wikitech.wikimedia.org/wiki/OSM_Tileserver#Extended_State_of_Things_-_2013-11-27
At this point, we're going to re-evaluate our options, and see if we can get some more eyeballs on the project overall. The scope is much larger than originally anticipated.
We're going to have a chat immediately after the holidays to chart a new way forward on the project internally, with a goal of getting a bare-bones setup that meets the exiting and proposed requirements, and work up from there. This will probably involve some more scope conversations, as well as some help from Kai, Tim (and indeed anyone else!) who might be able to assist in us getting this project to 1.0.
(In reply to comment #31)
We're going to have a chat immediately after the holidays to chart a new way
forward on the project internally, with a goal of getting a bare-bones setup
that meets the exiting and proposed requirements, and work up from there.
will probably involve some more scope conversations, as well as some help
Kai, Tim (and indeed anyone else!) who might be able to assist in us getting
this project to 1.0.
(I assume "holidays" here means US thanksgiving and today "weekend".) As some of the delays seem to stem from "internal project", how about hiring/funding someone knowledgeable about the current Toolserver (or another tileserver) setup and assisting *them* in setting it up at WMF instead of the other way around which appears to be *far* more laborious?
Quotes like (emphasis added): "The *best* way to scale would be to render the whole database to vector tiles. The software setup for this is an *unknown* to me (and I'm guessing most who aren't directly involved), [...]" make me cringe. Let's leave optimizing the code to people who actually know it. A *lot* of hardware can be bought before we need to throw away running code.
What is the best way to comment on the individual issues raised by Brandon? Inline comments in the wikipage? Discussing them here on the ticket, or as comments at the bottom of the wikipage?
The two main issues I'd like to address though are:
So it seems like we really need a better and more justified estimate of the expected load. I.e. looking through the logs of the existing systems, both for mobile (mapquest open) and desktop (toolserver) and figure out what access patterns the pages have, where these tiles would be used in.
My impression is that we are trying to build a system that is potentially orders of magnitude more potent than what is actually needed in the beginning.
One has to, however, also remember, that there are two types of scaling in a typical tile server that are semi-independent and scale in different ways and have rather different hardware demands: Scaling of serving of tiles and scaling of rendering. Although rendering does to some degree scale with serving load, to a good degree, it also scales with update frequency and editing frequency in OSM, both of which are independent of the size of the serving site.
E.g. looking at the setup on osm.org. Out of the 3000 tiles/s served, typically only somewhere between 5 - 10 tiles/s are rendered on the fly. As most of those are updates, only about 1 tile/s is not actually available in the master cache (of about 1.5TB) and needs to be rendered on the fly. As nearly all of those are high zoom tiles, they typically can be rendered in a few 100 milliseconds and so a single multi-core server is usually quite capable of rendering 5 - 10 tiles/s. The master cache can also vary smoothly between 0 and the full set of all tiles depending on where you want to put the trade off of between disk space and rendering capacity. Empirically 1 - 2 TB have generally proven as fairly good trade-offs.
Furthermore, mod_tile / renderd are designed to very gracefully degrade once they can't keep up with the rendering load anymore and simply some of the areas won't be quite fully up-to-date for the duration of the overload. But pretty much no one will notice if it shows "yesterday's" map instead of "today's" map.
Once you move over to trying to display many different map styles, things start to look differently, as in the current setups you need to more or less "duplicate" things per style. At that point you probably do want to move over to "vector tiles" which get styled and rendered either on the client side, or potentially on the fly server side for weak dumb clients. The software stack for that is indeed not yet well developed and tested. But for a single style, I would consider the current software stack as fairly mature, robust and scalable.
Brandon may be able to respond to the individual items mentioned, however, in terms of expected load, I believe Tomasz is looking to get some focus on this specific question (as well as scope of requested features) from a team within engineering, which will help tremendously with several of these questions.
On the Operations side, Alex has agreed to have an initial look at the infrastructure with an eye on how to move forward. At minimum, we need a base tileserver, and to replicate the currently utilized components of the toolserver install. He'll continue to look into this in-between other projects while we wait for clearer direction on overall scope and scale.
The result of the office hour on the Toolserver migration (cf. [[m:IRC office hours/Office hours 2014-01-23]]) was that in an effort to not have grander plans block the Toolserver migration, Alexandros will set up replication from OSM to the (currently unused) Labs PostgreSQL server in Ashburn. I have created bug #60461 to track this.
The idea is that, after this, the existing Toolserver OSM tools have access to the database and can be migrated, tile server(s) in similar scope to the existing Toolserver instance(s) can be set up and tested in Labs, and performance data/puppetization/etc. can be gathered to inform decisions on production tile servers that do not have deadlines to meet.
Coren estimated that the move of Labs to Ashburn that is a (soft) requirement for accessing the PostgreSQL server will be completed by mid-March. This will leave volunteers with about three months to migrate the tools in their spare time before Toolserver will be decommissioned in July.
Just a brief update on the situation of the tileserver on the labs infrastructure.
With the openstreetmap replicated database accessible from labs up and running, we have begun setting up and testing the tile rendering infrastructure in labs.
There is a demonstration map accessible at http://a.tiles.wmflabs.org/osm/slippymap.html
and the tiles can be accessed (for the default osm style) under the URL http://a.tiles.wmflabs.org/osm/0/0/0.png.
At the moment only the default openstreetmap style (and a demonstration multi-lingual style) is up and running on labs. The various different styles that were on the toolserver have not yet been migrated. But the technical infrastructure should hopefully already be able to support that and so once style maintainers update their styles to mapnik 2.2, we should be able to activate them in the labs infrastructure.
Tile expiry and updating is also not yet enabled, but that is probably next on the todo list.
Overall not all things haven't been finalized and are still subject to change.
Tile expiration has been enabled on the PostgreSQL server and the files are being served via rsync to the maps-tiles instances and last I heard it is being used by the maps instances so that is done too. Lowering the sync interval is the next on my TODO list