Page MenuHomePhabricator

Set up a Nominatim instance to avoid geocode lookups with Google and/or OSM from tools
Closed, ResolvedPublic

Description

There are several tools on toolforge that currently use google (reverse) geocode or osm nominatim

This is obviously a useful service to have, but for privacy reasons, we want to avoid making external requests for these. Hosting our own Nominatim instance (preferably against the WMF OSM Db), would seem very useful to avoid these requests.

https://github.com/openstreetmap/Nominatim/blob/master/docs/admin/Installation.md
Look what cool features we can build with it:

Screen Shot 2018-03-16 at 16.13.06.png (1×2 px, 2 MB)

Event Timeline

MaxSem renamed this task from Set up a Nominatum instance to avoid geocode lookups with Google and/or OSM from tools to Set up a Nominatim instance to avoid geocode lookups with Google and/or OSM from tools.Mar 16 2018, 6:15 PM

Note that Nominatim doesn't work against a vanilla OSM database - you'll need another copy, this time with its own osm2pgsql style. Luckily, this copy is much smaller.

One big warning though: 99% of the load on OSM's Nominatim instance is third parties ignoring their ToS, so I would be extremely worry about having this in WMF, even as a cloud instance.

A big part of running any geocoder is watching the ToS and dealing with bulk geocoding.

Yeah, anyone got any ideas for that ? issue API tokens to oauth accounts ?

We could also just setup a proxy, do caching, rate limiting and require oauth authentication or something...

The rate limiting is not a particular issue right now. The limit for nominatim.osm.org is 1 request per second across all users from a site/app, regardless of proxying. Right now we have about 2 requests per hour from wmflabs.org

Given the low level of usage, I don't see any value in setting up our own Nominatim server. If we needed a server where we had a guarantee it would continue operating and we had access to it, we could pay one of the various third parties which sell Nominatim services and proxy it.

At 2 requests per hour, it seems like even a proxy is premature and features should first be built.

@Pnorman the proxy would mostly be for privacy reasons of course.

Mholloway subscribed.

Unlikely WMF will have resources to work on this in the near future.

Now that the Content Security Policy went into effect, this is becoming higher prio, as tools relying on nominate.osm.org are now no longer functional. Like:

https://tools.wmflabs.org/locator/coordinates.php
https://tools.wmflabs.org/locator-tool/

I've got a pretty basic proxy running for this now. Will turn it into it's own tools project.

Double compression I believe.

zhuyifei1999@zhuyifei1999-ThinkPad-X260:~$ curl -s 'https://tools.wmflabs.org/nominatim/search' --compressed | file -
/dev/stdin: gzip compressed data, from Unix
zhuyifei1999@zhuyifei1999-ThinkPad-X260:~$ curl -s 'https://tools.wmflabs.org/nominatim/search' --compressed | zcat | head
<!DOCTYPE html>
<html lang="en">
<head>
    <title>OpenStreetMap Nominatim: Search</title>
    <meta content="IE=edge" http-equiv="x-ua-compatible" />
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <base href="https://nominatim.openstreetmap.org/" />
    <link href="nominatim.xml" rel="search" title="Nominatim Search" type="application/opensearchdescription+xml" />
    <link href="css/leaflet.css" rel="stylesheet" />

@Myst

"https://nominatim.openstreetmap.org/search?format=json&q={s}" by "https://tools.wmflabs.org/nominatim/search"

That should be https://tools.wmflabs.org/nominatim/search?format=json&q={s} in that case.
I fixed the double encoding issue and I also added rate limiting.

@Myst

"https://nominatim.openstreetmap.org/search?format=json&q={s}" by "https://tools.wmflabs.org/nominatim/search"

That should be https://tools.wmflabs.org/nominatim/search?format=json&q={s} in that case.
I fixed the double encoding issue and I also added rate limiting.

Proxy added to my tool, no issue, work as intended.

Thanks,

TheDJ claimed this task.

Let's mark this as resolved for now. If we need any higher level support than tools, a new ticket can be opened.

DB111 subscribed.

Did Nominatim loose it's CORS header? Can no longer request from other toolforge tool.

I did update it last week, ill take a look later tonight.

Which toolforge tool btw ?

solved by restarting the server. Next time, please open a NEW ticket for NEW problems. you can always link back to a ticket like this simply by mentioning it in the new ticket.