Page MenuHomePhabricator

Allow download of Wikidata query results in GPS-friendly format(s)
Closed, ResolvedPublic

Description

When a Wikidata Query Service query results in data that can be mapped, the user should be able to download that data in one or more format(s) suitable for import into mapping tools; such as GPS Exchange Format (GPX), Keyhole Markup Language (KML), GeoJSON, etc.

Current proposed patch supporting GPX, GeoJSON and KML:

https://gerrit.wikimedia.org/r/#/c/wikidata/query/gui/+/516662/

Test on this live platform:

https://pebbie.org/wdqs/#%23Map%20of%20hospitals%0A%23added%202017-08%0A%23defaultView%3AMap%0ASELECT%20DISTINCT%20%2a%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%2Fwdt%3AP279%2a%20wd%3AQ16917%3B%0A%20%20%20%20%20%20%20%20wdt%3AP625%20%3Fgeo%20.%0A%7D%0ALIMIT%2010

How it looks like (see last menu entries):

Screenshot KML GeoJSON GPX download on WDQS.png (556×429 px, 73 KB)

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
feature: add exporting as geoJSONrepos/wmde/wikidata-query-gui!25atomoilfeature/geojsonmain
Customize query in GitLab

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I have implemented for GeoJSON, GPX, and KML. here's the snippet. four small npm libraries are used : wicket (parsing WKT), geojson, togpx, tokml

For those interested: @Peb’s implementation is live at https://pebbie.org/wdqs/ (example query: http://tinyurl.com/yyyo3gn5 )
The unmerged patcheset lives at https://gerrit.wikimedia.org/r/#/c/wikidata/query/gui/+/516662/

What needs to be done for this to be be deployed to the live Query Service?

I have some feedback from the Wikidata team:

the person who submitted a code snipped needs to turn it into a proper patch in Gerrit so it can be reviewed and merged and then deployed. (I don't know how much additional work is left on top of the code snippet they pasted.)

What needs to be done for this to be be deployed to the live Query Service?

I have some feedback from the Wikidata team:

the person who submitted a code snipped needs to turn it into a proper patch in Gerrit so it can be reviewed and merged and then deployed. (I don't know how much additional work is left on top of the code snippet they pasted.)

@Pigsonthewing Please see my comment just above:

@Pigsonthewing Please see my comment just above:

The unmerged patcheset lives at https://gerrit.wikimedia.org/r/#/c/wikidata/query/gui/+/516662/

This is beyond my skill set (I'm just the messenger!), but I note that that has a red label: "Cannot Merge".

@Lydia_Pintscher Please can you kindly delegate to someone who can move this forward, or at least tell us what is needed to do so?

Gehel subscribed.

This was discussed by the Search Platform team. Brief summary:

Some of the download formats are managed by Blazegraph directly, via content negotiation. This is definitely not something we would like to extend as it would increase our reliance on Blazegraph. Having this implemented as part of WDQS-UI seems reasonably fine. Even better would be a dedicated service which could be reused on other SPARQL endpoint.

Task Breakdown Notes:

  • We are not sure if this functionality should be recreated in case the patch cannot be applied, or if this should go back to the backlog? @Lydia_Pintscher @Arian_Bozorg
  • We can try to apply the patch by rebasing it over the latest HEAD, and test it both through CI and locally
  • In case there are any comments on the patch, we assume that one of us will need to apply changes requested.

Hey everyone,

We looked into the existing patch more closely and it currently pulls in 4 new libraries that would all have to go through security review at the WMF. That's gonna take a lot of time and effort to make it happen unfortunately. One step I could see is that we reduce the patch to one format and do the dance with 1 or max 2 libraries, which would be more manageable and warranted by the importance of the task. If I understand it correctly the patch builds GeoJSON and then derives the other formats from it. Would providing only GeoJSON be a reasonable thing or is that useless for you?

Would providing only GeoJSON be a reasonable thing[...] ?

One format would certainly be better than none; but we really need to be providing alternative formats. My original request , when I opened this ticket[/*], was for "format(s) suitable for import into mapping tools; such as GPS Exchange Format (GPX), Keyhole Markup Language (KML), etc"

GeoJSON might be good for coders, but it's of little use to people like me who just want to see the output of a query in their preferred mapping tool, or on a hand-held GPS device.

/* Four years ago this coming Monday - do we get cake?

Is one of those other formats significantly more useful/better/...?

Is one of those other formats significantly more useful/better/...?

Personally, KML, but I have done no user-research with the wider user community.

[By way of illustration, the question is like saying "which is better: JPEG, TIFF or raw?" - it all depends on the use case.]

We looked into the existing patch more closely and it currently pulls in 4 new libraries that would all have to go through security review at the WMF.

And then a lot more dependants...

Just what the new packages bring in:

Screenshot 2023-02-15 at 22.47.55.png (1×1 px, 255 KB)

Before:

Screenshot 2023-02-15 at 22.51.32.png (1×1 px, 186 KB)

After:

Screenshot 2023-02-15 at 22.51.41.png (1×836 px, 241 KB)

@Reedy what tool did you use to create these diagrams, I'd be interested in it for dependency analysis in other projects as well.

@Reedy what tool did you use to create these diagrams, I'd be interested in it for dependency analysis in other projects as well.

First result on google ;)

https://npmgraph.js.org/

Ah it's that online tool... Thought you might have something offline or at an editor or repository level

@Lydia_Pintscher Is there a method for exporting in the various formats which does not require 4 new libraries that would all have to go through security review at the WMF taking a lot of time and effort to make it happen?

WDQS can already export in JSON. It is counterintuitive that exporting in geoJSON, KML &c - seemingly straightforward formats - is so hugely difficult that it should not be attempted.

As noted in that discussion, QLever can export GeoJSON, with the caveat that its replag is measured in days or weeks and the GeoJSON it outputs is slightly oddly formatted.

@Lydia_Pintscher Is there a method for exporting in the various formats which does not require 4 new libraries that would all have to go through security review at the WMF taking a lot of time and effort to make it happen?

WDQS can already export in JSON. It is counterintuitive that exporting in geoJSON, KML &c - seemingly straightforward formats - is so hugely difficult that it should not be attempted.

My understanding is that we would have to implement what the libraries are doing ourselves, which doesn't sound too great either because of the maintenance associated with it. What I can not say is how much of the stuff in the libraries is unneeded for our particular case and if there are ways to slim this down. I think this requires a developer with a bit more understanding of the formats to look into it and a sense of which format to focus on.

As noted in that discussion, QLever can export GeoJSON, with the caveat that its replag is measured in days or weeks and the GeoJSON it outputs is slightly oddly formatted.

That is great to hear. The major blocker for QLever to have real-time updates was the lack of us publishing a suitable changes stream. This is happening now and the QLever team can integrate it.

By the way, I posted a couple workarounds on the OSM forum where this task was crossposted. The tools I mentioned can output GeoJSON; if you need KML or GPX, there are a number of command line conversion utilities, or you can use a Web frontend like geojson.io or mapshaper.

workarounds

Thank you. I'm sure most commenting here would be familiar with those, or something similar. The need is to provide the facility for end users, who are not.

My understanding is that we would have to implement what the libraries are doing ourselves, which doesn't sound too great either because of the maintenance associated with it. What I can not say is how much of the stuff in the libraries is unneeded for our particular case and if there are ways to slim this down. I think this requires a developer with a bit more understanding of the formats to look into it and a sense of which format to focus on.

Of the formats mentioned so far, GeoJSON is the most interoperable with GIS tools in general, GPX is mainly used for GPS tracking (especially useful for recording timestamps, which I’m not certain WDQS would be outputting anyways), and KML is pretty much redundant to GeoJSON at this point.

To my knowledge, Wikidata currently only stores point geometries. If this task is specific to Wikidata’s needs, then one way to simplify the task is to limit it to point features and exclude the other geometry types that can be encoded as well-known text. Then it’s purely a matter of concatenation, which requires little in the way of code to review. After all, it’s already possible to generate such a GeoJSON purely in SPARQL.

GPX is mainly used for GPS tracking

Isn't it useful for someone wanting to, say, put a list of statues in a town into their GPS device, in order to visit them all?

especially useful for recording timestamps, which I’m not certain WDQS would be outputting anyways

Not in the GPS tracking sense, but possibly for a series of dated events.

To my knowledge, Wikidata currently only stores point geometries.

So far, yes.

GPX is mainly used for GPS tracking

Isn't it useful for someone wanting to, say, put a list of statues in a town into their GPS device, in order to visit them all?

Yes, that’s a form of GPS tracking. This particular use case would make very basic use of the GPX format, just a sparse list of waypoints. Personally, I think GeoJSON would provide better value for the effort required to enable additional output formats, because so much more software supports it, especially on the Web. For example, you could stick the output in a map data file on Commons. But if both formats can be supported, so much the better.

Let's see if this helps; please feel free to share it widely:

https://pigsonthewing.org.uk/developer-needed-wikidata-geographical-data/

Hey! I saw @Pigsonthewing 's blog post and here I am. I'm happy to implement an export in GeoJSON and any other formats, and can do it without using any external libraries.

I have no experience with the Wikipedia dev ecosystem, so I would appreciate some guidance.

So far, I've created an account on Wikipedia to sign in here, and I've seen the code is now here: https://gitlab.wikimedia.org/repos/wmde/wikidata-query-gui/ and I've cloned the repo using git. I tried pressing "Sign in" and it wanted me to sign in with a Wikipedia Dev account, so I created one using the same username (Atom.oil.2). I can see where to make the code changes, so my next step is to make sure the code runs and then make the change and test it locally. Once I've done that, what are the next steps?

You probably need to request for your GitLab account to be activated – see the “file an account activation request” link in the Wikimedia GitLab documentation. (Also, while it’s not required, it would be great if you could link your developer account in your Phabricator settings – I hope the process should be relatively straightforward.) Then, once the code works locally, you can create a fork of that GitLab repository, push your changes there, and then open a merge request (GitLab should show you a link for that as soon as you’ve pushed to your fork). Make sure to include “Bug: T216601” at the end of the commit message, with a blank line separating it from the rest of the commit message (example), so that the merge request will be automatically linked to this task.

@Atom.oil.2 Are you actively working on this currently? If not I am available to step in and write conversions by hand (without libraries) as well to result in the same file export functionality.

@tauraamui I am planning to work on this on/after June 8th, as some other things have my attention until then, so if you have time now and want to get involved, then please go for it. :-)

Quick update: I've got a version of the code running locally that exports as GeoJSON. My next steps, in the following days, are to tidy the code up and follow the steps that @Lucas_Werkmeister_WMDE kindly outlined to get it reviewed.

I've created the fork, pushed the code to it and opened a merge request. I hope I've followed @Lucas_Werkmeister_WMDE's guidance properly, but this is my first time working on anything Wikipedia-related, so I'm open to any and all feedback/things I can improve.

hey all - just to clarify, I've added GeoJSON for the first PR/MR, but I'm very happy to add KML / GPX and any/all other formats, but I wanted to keep the first version simple.

Change #1160842 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[operations/deployment-charts@master] wikidata-query-gui: Bump query-gui image version

https://gerrit.wikimedia.org/r/1160842

Change #1160842 merged by jenkins-bot:

[operations/deployment-charts@master] wikidata-query-gui: Bump query-gui image version

https://gerrit.wikimedia.org/r/1160842

This is deployed and should be working now (though it might take another hour or so for the en.json l10n file to update).

That works! Thank you.

I was able to successfully open an exported GeoJSON file in OSM's "JOSM" editor; where I found the relevant and correct fields from my query included.

One minor niggle; the download link is "wdqs-app-result-geojson"; the equivalent for JSON is "JSON file". Can we have a more user-friendly link text, in keeping with the others, please?

Looking forward to KML / GPX too!

One minor niggle; the download link is "wdqs-app-result-geojson"; the equivalent for JSON is "JSON file". Can we have a more user-friendly link text, in keeping with the others, please?

Yes, that’s the outdated en.json file that I already mentioned which should resolve itself soon.

I see the correct “GeoJSON file” label now (but also the caching issue will be resolved for T397452 anyway).

Now that GeoJSON is added, my thoughts turn to KML and GPX files.

TL;DR - does anyone have an example KML and/or GPX file which shows how the data would ideally be formatted/mapped? Or any other format you think I should add?


Adding the non-geospatial metadata to GeoJSON was easy, as it contains a "properties" node to which we can add them. I think GPX and KML are slightly harder, as they lack the same level of obvious (to me) functionality.

Detail of GPX Format

A basic example of a GPX file is on the Wikipedia page here: https://en.wikipedia.org/wiki/GPS_Exchange_Format

I couldn't quickly find a good/easy-to-understand webpage with an example of GPX Point nodes, but Google's AI overview produced the following:

Key Elements for Points in GPX:
<wpt>: This tag encloses the information for a single waypoint. 
<lat>: Represents the latitude of the point in decimal degrees. 
<lon>: Represents the longitude of the point in decimal degrees. 
<name>: (Optional) A human-readable name for the waypoint. 
<cmt>: (Optional) A comment or description for the waypoint. 
<desc>: (Optional) A longer description of the waypoint. 
<ele>: (Optional) The elevation of the point in meters. 
<time>: (Optional) The time the point was recorded, using ISO 8601 format (e.g., 2024-10-27T10:00:00Z). 
<sym>: (Optional) A symbol to represent the waypoint on a map. 
<type>: (Optional) A category or type for the waypoint. 
<extensions>: (Optional) Allows for custom data to be included.

(arguably one of us should update the Wikipedia page to add this level of detail for the next person ;-))

I don't see an ideal place to add the metadata from the query, unless we use extensions and either find a specific XML format to use or use the column names as XML tags, with the value as a child of the tag. Ideally, we'd also have a consistent name, but I am unsure if it's possible to determine that museumLabel (in the sample query I've been using) should appear as name in the GPX file.

Detail of KML Format

There's an example of a Placemark (Point) KML file here - https://developers.google.com/kml/documentation/kml_tut#placemarks

Similarly to GPX, there isn't an obvious place to put arbitrary properties. The only thing that springs to mind is to use the "Descriptive HTML in Placemarks" example and add the column name and values directly in the description, eg:

museum: http://www.wikidata.org/entity/Q665556
museumLabel: St Vigeans Sculptured Stones Museum
mds: Q665556
osmWay: 929249849
sitelink: https://en.wikipedia.org/wiki/St_Vigeans_Sculptured_Stones_Museum

That doesn't seem very satisfying to me, but as I am not a regular user of KML I can't be sure.

We also have the same challenge with GPX of knowing which column to use as the name.

Adding the non-geospatial metadata to GeoJSON was easy, as it contains a "properties" node to which we can add them. I think GPX and KML are slightly harder, as they lack the same level of obvious (to me) functionality.

Key Elements for Points in GPX:
<cmt>: (Optional) A comment or description for the waypoint. 
<desc>: (Optional) A longer description of the waypoint.

[Caveat: I'm no expert!]

It may be that you're going to have to shoehorn data into one or other of the above (although I would be interested to see what can be done with extensions).

One of the issues is that the columns, and their names, in query results are arbitrary,

Part of the GeoJSON for one of my queries is:

"mds":"Q2969293","osmWay":"68536540","sitelink":"https://en.wikipedia.org/wiki/Fyvie_Castle","artUK":"national-trust-for-scotland-fyvie-castle-7253"

so you may need to dump that text, or something like it, as is.

And whatever would be the equivalent for KML.

Looking at https://www.topografix.com/gpx_mailing_list.asp#000101c240b1$c4cb6bd0$0200a8c0@fogey.com

we could define a namespace and provide an XLS.

The hypothetical example given is:

xmlns:gpxtracking="http://www.topografix.com/GPX/Tracking/1/0"

<gpxtracking:info>
 <gpxtracking:time>2002-07-25T03:27:54Z</gpxtracking:time>
 <gpxtracking:lat>42.5109</gpxtracking:lat>
 <gpxtracking:lon>42.5109</gpxtracking:lon>
 <gpxtracking:speed>75</gpxtracking:speed>
</gpxtracking:info>

So maybe we would have something like:

<wdgpx:statement>P1602;national-trust-for-scotland-fyvie-castle-7253</wdgpx:statement>
<wdgpx:statement>P10689;68536540</wdgpx:statement>

for the simple statements and:

<wdgpx:concat>sitelink;https://en.wikipedia.org/wiki/Fyvie_Castle</wdgpx:concat>

for concatenated values that don't exist outside the query.