Page MenuHomePhabricator

Add a geo lookup service to WDQS based on the .map pages on Commons
Open, Needs TriagePublic

Description

I think it would be a very valuable addition to have a generic polygon lookup function for the WDQS.

Use cases

  • Given a geo point, tell me which country/region/zip code/tectonic plate/voting district it belongs to
  • Historic geo lookups - use older map data
  • Non-earth lookups - regions of the moon/mars/...

Usage as a service

SELECT * WHERE {
?wd wdt:P625 ?location .
SERVICE wikibase:geolookup {

  #    --- INPUT ---
  # this is the .map page on Commons in the data namespace
  bd:serviceParam wikibase:data 'World Countries Outline.map' .

  # The globe which is being searched. Optional, default it's Earth (wd:Q2)
  bd:serviceParam wikibase:globe wd:Q2 .  

  # ?location specifies the point to lookup
  bd:serviceParam wikibase:location ?location .

  #   --- OUTPUT ---
  # Assigns geojson's wikidataId property to ?countryWd
  ?countryWd wikibase:property 'wikidataId' .

  # more than one property can be extracted from the same geojson feature
  ?countryIso wikibase:property 'isoCode' .
} }

Algorithm

  • Download commons:data:World Countries Outline.map page
  • Create (and cache) an RTree from all closed polygons. Should also handle multipolygons with holes.
  • For all ?location points, find the first polygon that contains it.
  • extract all requested properties from the found polygon

Event Timeline

Yurik created this task.Nov 7 2017, 10:21 PM
Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptNov 7 2017, 10:21 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Yurik updated the task description. (Show Details)Nov 13 2017, 7:43 AM
Yurik updated the task description. (Show Details)Nov 13 2017, 8:01 AM
Yurik updated the task description. (Show Details)Nov 13 2017, 8:27 AM
Yurik updated the task description. (Show Details)Nov 13 2017, 8:50 AM
Yurik updated the task description. (Show Details)

Probably can be done as a service:

SERVICE wikibase:geolookup {
  bd:serviceParam wikibase:location ?location .
  bd:serviceParam wikibase:data 'World Countries Outline.map' . 
  bd:serviceParam wikibase:globe wd:Q2 .  
  ?wd wikibase:property 'wikidataId' .    
  ?iso wikibase:property 'isoCode' .      
}

It can look up the map on the call creation stage (be careful with timeout though!) and then on call stage just lookup the bindings against the existing map. Check out LabelService.java as an example of the service that does something similar.

Also see https://wiki.blazegraph.com/wiki/index.php/QueryHints (esp. runFirst and runLast) for the way to control when service runs (you probably want runLast).

Base added a subscriber: Base.Nov 13 2017, 12:25 PM

Label service is also very slow, like 2 times slower than to just query labels a normal way, considering that map data processing is probably more complex than just fetching a label I am afraid that it won't work for any real queries with current timeout…

Label service is also very slow, like 2 times slower than to just query labels a normal way, considering that map data processing is probably more complex than just fetching a label I am afraid that it won't work for any real queries with current timeout…

Label service is a disk IO operation. For every object, it must load some/all available labels, and pick the best. This service is fully in-memory and CPU bound -- you load the map once, cache it, and afterwards each point lookup is just an rtree search in a memory data structure.

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Mar 5 2018, 4:14 PM