Page MenuHomePhabricator

Add a geo lookup service to WDQS based on the .map pages on Commons
Open, LowPublic

Description

I think it would be a very valuable addition to have a generic polygon lookup function for the WDQS.

Use cases

  • Given a geo point, tell me which country/region/zip code/tectonic plate/voting district it belongs to
  • Historic geo lookups - use older map data
  • Non-earth lookups - regions of the moon/mars/...

Usage as a service

SELECT * WHERE {
?wd wdt:P625 ?location .
SERVICE wikibase:geolookup {

  #    --- INPUT ---
  # this is the .map page on Commons in the data namespace
  bd:serviceParam wikibase:data 'World Countries Outline.map' .

  # The globe which is being searched. Optional, default it's Earth (wd:Q2)
  bd:serviceParam wikibase:globe wd:Q2 .  

  # ?location specifies the point to lookup
  bd:serviceParam wikibase:location ?location .

  #   --- OUTPUT ---
  # Assigns geojson's wikidataId property to ?countryWd
  ?countryWd wikibase:property 'wikidataId' .

  # more than one property can be extracted from the same geojson feature
  ?countryIso wikibase:property 'isoCode' .
} }

Algorithm

  • Download commons:data:World Countries Outline.map page
  • Create (and cache) an RTree from all closed polygons. Should also handle multipolygons with holes.
  • For all ?location points, find the first polygon that contains it.
  • extract all requested properties from the found polygon

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Yurik updated the task description. (Show Details)

Probably can be done as a service:

SERVICE wikibase:geolookup {
  bd:serviceParam wikibase:location ?location .
  bd:serviceParam wikibase:data 'World Countries Outline.map' . 
  bd:serviceParam wikibase:globe wd:Q2 .  
  ?wd wikibase:property 'wikidataId' .    
  ?iso wikibase:property 'isoCode' .      
}

It can look up the map on the call creation stage (be careful with timeout though!) and then on call stage just lookup the bindings against the existing map. Check out LabelService.java as an example of the service that does something similar.

Also see https://wiki.blazegraph.com/wiki/index.php/QueryHints (esp. runFirst and runLast) for the way to control when service runs (you probably want runLast).

Label service is also very slow, like 2 times slower than to just query labels a normal way, considering that map data processing is probably more complex than just fetching a label I am afraid that it won't work for any real queries with current timeout…

Label service is also very slow, like 2 times slower than to just query labels a normal way, considering that map data processing is probably more complex than just fetching a label I am afraid that it won't work for any real queries with current timeout…

Label service is a disk IO operation. For every object, it must load some/all available labels, and pick the best. This service is fully in-memory and CPU bound -- you load the map once, cache it, and afterwards each point lookup is just an rtree search in a memory data structure.

Gehel triaged this task as High priority.Sep 15 2020, 7:52 AM
MPhamWMF lowered the priority of this task from High to Low.Oct 14 2021, 7:41 PM