Page MenuHomePhabricator

Set up a Nominatim instance to avoid geocode lookups with Google and/or OSM from tools
Closed, ResolvedPublic

Description

There are several tools on toolforge that currently use google (reverse) geocode or osm nominatim

This is obviously a useful service to have, but for privacy reasons, we want to avoid making external requests for these. Hosting our own Nominatim instance (preferably against the WMF OSM Db), would seem very useful to avoid these requests.

https://github.com/openstreetmap/Nominatim/blob/master/docs/admin/Installation.md
Look what cool features we can build with it:

Event Timeline

TheDJ created this task.Mar 16 2018, 3:04 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 16 2018, 3:04 PM
TheDJ updated the task description. (Show Details)Mar 16 2018, 3:14 PM
debt added a subscriber: debt.
MaxSem renamed this task from Set up a Nominatum instance to avoid geocode lookups with Google and/or OSM from tools to Set up a Nominatim instance to avoid geocode lookups with Google and/or OSM from tools.Mar 16 2018, 6:15 PM
MaxSem added a subscriber: MaxSem.Mar 16 2018, 6:25 PM

Note that Nominatim doesn't work against a vanilla OSM database - you'll need another copy, this time with its own osm2pgsql style. Luckily, this copy is much smaller.

One big warning though: 99% of the load on OSM's Nominatim instance is third parties ignoring their ToS, so I would be extremely worry about having this in WMF, even as a cloud instance.

A big part of running any geocoder is watching the ToS and dealing with bulk geocoding.

TheDJ added a comment.Mar 19 2018, 9:53 AM

Yeah, anyone got any ideas for that ? issue API tokens to oauth accounts ?

We could also just setup a proxy, do caching, rate limiting and require oauth authentication or something...

zhuyifei1999 updated the task description. (Show Details)
zhuyifei1999 added a subscriber: zhuyifei1999.

The rate limiting is not a particular issue right now. The limit for nominatim.osm.org is 1 request per second across all users from a site/app, regardless of proxying. Right now we have about 2 requests per hour from wmflabs.org

Given the low level of usage, I don't see any value in setting up our own Nominatim server. If we needed a server where we had a guarantee it would continue operating and we had access to it, we could pay one of the various third parties which sell Nominatim services and proxy it.

At 2 requests per hour, it seems like even a proxy is premature and features should first be built.

@Pnorman the proxy would mostly be for privacy reasons of course.

Mholloway triaged this task as Lowest priority.Jul 24 2018, 4:43 PM
Mholloway added a subscriber: Mholloway.

Unlikely WMF will have resources to work on this in the near future.

TheDJ added a comment.EditedOct 9 2018, 1:37 PM

Now that the Content Security Policy went into effect, this is becoming higher prio, as tools relying on nominate.osm.org are now no longer functional. Like:

https://tools.wmflabs.org/locator/coordinates.php
https://tools.wmflabs.org/locator-tool/

TheDJ added a comment.Oct 9 2018, 3:16 PM

I've got a pretty basic proxy running for this now. Will turn it into it's own tools project.

jmatazzoni added a subscriber: jmatazzoni.
jmatazzoni removed a subscriber: jmatazzoni.
Myst added a subscriber: Myst.Oct 10 2018, 6:45 PM

Doesn't work for me.
I tried to replace "https://nominatim.openstreetmap.org/search?format=json&q={s}" by "https://tools.wmflabs.org/nominatim/search", but I received some weird binary data as result.

Double compression I believe.

zhuyifei1999@zhuyifei1999-ThinkPad-X260:~$ curl -s 'https://tools.wmflabs.org/nominatim/search' --compressed | file -
/dev/stdin: gzip compressed data, from Unix
zhuyifei1999@zhuyifei1999-ThinkPad-X260:~$ curl -s 'https://tools.wmflabs.org/nominatim/search' --compressed | zcat | head
<!DOCTYPE html>
<html lang="en">
<head>
    <title>OpenStreetMap Nominatim: Search</title>
    <meta content="IE=edge" http-equiv="x-ua-compatible" />
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <base href="https://nominatim.openstreetmap.org/" />
    <link href="nominatim.xml" rel="search" title="Nominatim Search" type="application/opensearchdescription+xml" />
    <link href="css/leaflet.css" rel="stylesheet" />
TheDJ added a comment.Oct 10 2018, 9:57 PM

@Myst

"https://nominatim.openstreetmap.org/search?format=json&q={s}" by "https://tools.wmflabs.org/nominatim/search"

That should be https://tools.wmflabs.org/nominatim/search?format=json&q={s} in that case.
I fixed the double encoding issue and I also added rate limiting.

Myst added a comment.Oct 11 2018, 9:42 AM

@Myst

"https://nominatim.openstreetmap.org/search?format=json&q={s}" by "https://tools.wmflabs.org/nominatim/search"

That should be https://tools.wmflabs.org/nominatim/search?format=json&q={s} in that case.
I fixed the double encoding issue and I also added rate limiting.

Proxy added to my tool, no issue, work as intended.

Thanks,

TheDJ closed this task as Resolved.Apr 17 2019, 3:00 PM
TheDJ claimed this task.

Let's mark this as resolved for now. If we need any higher level support than tools, a new ticket can be opened.