Page MenuHomePhabricator

Investigate adding location information to structured content
Open, Needs TriagePublic

Description

User Story: “As a customer, I want to have the place/location of an entity if it it is a place or location,
so that I can use the location information with 3rd party location services.”

Acceptance criteria
Investigate Wikidata location information and a consistent way to extract the place name for relevant entities

ToDo

  • Research Wikidata location statements and PID. values
  • Create sample JSON for the updated place/location so the PM can get feedback on the use case
  • Demo the sample JSON to the team

This is not an implementation ticket, only sample JSON will be delivered.

Event Timeline

Finding place name is not trivial in Wikidata statements, they have many different PIDs for the statement properties for location names.

The only consistent location property is P625 (for the geolocation) with the latitude and longitude coordinates. One option is to take the geolocation and do a reverse lookup for the nearest city. Either using SPARQL or a 3rd party lat/long place name lookup.

PIDs that have place name:
"headquarter office (building)" P159
"headquarter location" P276
"located in the administrative territorial entity" P131
"location" P276
"location of formation" P740 (bands, groups, political parties we founded in a location)
"filming location" P915
"location of discovery" P189
"Place of publication" P291
"location map" P1943
...more PIDs that I've not found yet.
Many examples have multiple locations: One example is "Microsoft" which has headquarters (P159) in "Redmond", (with P625). Was formed in (P740) "Albuquerque". And has a "work location" (P937) in "Silicon Valley".

Possible SPARQL to find nearest city

SELECT DISTINCT * 
    WHERE
    {
      ?place wdt:P31/wdt:P279* wd:Q515 .
       # Search by Nearest
  SERVICE wikibase:around { 
    ?place wdt:P625 ?location . 
    bd:serviceParam wikibase:center "Point(8.4024875340491 48.9993762209831)"^^geo:wktLiteral .
    bd:serviceParam wikibase:radius "10" . 
    bd:serviceParam wikibase:distance ?distance .
  }
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
    } 

Order by  ?distance

@SDelbecque-WMF I'll close this ticket, so you can prioritise this "location" request within your OKRs for structured-contents. If you want it done this quarter add a new ticket with your formatting and Phab tags.

Preliminary work was done on this task. We need a wider investigation ticket to define the sub-domains we want to build from Wikidata and convert into knowledge graphs.

Wikidata has property P625, which is a "coordinate location" and holds the geolocation of the entity. We could add this to our API, so clients can see it's a location and get the GPS coordinates if they need it. If Wikidata has the place name we could add it.

Example:

{
    "name": "S.L. Benfica",
    "identifier": 574076,
    "description": "Portuguese association football club",
    "main_entity": {
        "identifier": "Q131499",
        "url": "https://www.wikidata.org/entity/Q131499",
        "location": "-9.184711 38.752667",
        "place": "Lisbon",
        "type": [
            {
                "property":"instance_of",
                "label":"association football club",
                "url":"http://www.wikidata.org/entity/Q476028"},
            {
                "property":"instance_of",
                "label":"professional sports team",
                "url":"http://www.wikidata.org/entity/Q20639856"}
         ]
    },...

We have other entities with the fields filled for place name:
"headquarter office (building)" P159
"headquarter location" P276
"located in the administrative territorial entity" P131
"location" P276
"location of formation" P740 (bands, groups, political parties we founded in a location)
"filming location" P915
"location of discovery" P189
"Place of publication" P291
"location map" P1943
...more PIDs that I've not found yet. Maybe we can do a SPARQL query that gets all the entities that have P625 and then look for the most frequent PIDs that are also in those entities, we can check which ones have place names and use that PID list as our place name lookup.

Note, many examples have multiple locations: One example is "Microsoft" which has headquarters (P159) in "Redmond", (with P625). Was formed in (P740) "Albuquerque". And has a "work location" (P937) in "Silicon Valley".