Page MenuHomePhabricator

RFC: Define wiki markup for external geo data like OSM shapes
Closed, ResolvedPublic

Description

<maplink> and <mapframe> wiki markup needs to specify a list of Wikidata IDs to get shapes stored in the OSM database, or other external data like geo pages or WikiData queries. This way, someone may add a shaded area (e.g. highlight a city/state/country) by specifying their Wikidata ID, instead of copy/pasting the whole geojson with a complex geometry into wikitext.

There are two ways we can do it: via params and by modifying content (geojson) format:

As attribute
<mapframe ... show="wikidata:Q12345,wikidata:Q67890">
  • PROS: simple, integrates well with other group/show concepts
  • CONS: hard to specify styling parameters
As geojson
{
  "type": "WikidataShape",
  "properties": { ... },  // same as simple styling
  "ids": [ 12345, 67890 ] // Qnumbers
}
  • PROS: Allows styling, easy to enforce syntax
  • CONS: non-standard geojson, not expandable to other data sources
As geojson URLs

This approach is identical to the Graph extension's [[ Extension:Graph/Guide#External_Data | external data ]]. It will allow us to specify any external data source, such as geojson pages, wikidata query service, etc.

{
  "type": "ExternalData",
  "properties": { ... },  // same as simple styling
  "href": "osmshapes:///?ids=Q12345,Q67890"
}
  • PROS: Allows styling, easy to enforce syntax, multiple data sources
  • CONS: non-standard geojson
Implementation thoughts

The service returns data as highly optimized TopoJSON, with each object IDed with Qnumber. Once decoded into GeoJSON, each Q id is given as a separate feature object:

{
    "type": "FeatureCollection",
    "features": [{
            "type": "Feature",
            "id": "Q12130",
            "properties": {},
            "geometry": { "type": "MultiPolygon",  ... }
    }, { ... }, { ... }]
}

Parsing: More optimally we should use the TopoJSON library, but if we want "all you can eat", we could use the leaflet-omnivore.
Optimization: To get the best mileage out of TopoJSON, all maps on the page should combine all their requests into one, instead of each retrieving their own. It may be very wasteful for multiple maps to each generate nearly identical request to the server.
Async: Each external data item should exist as its own key within the wgKartographerLiveData. This way we will have an ability to pre-populate it in some cases. Also, current getMapGroupData() needs to be updated - it should check recursively if there are any "type": "ExternalData" elements, and if so, it should $.when(...) for all missing values. Lastly, I think getMapGroupData() should replace all items it requests with a promise. This way if there is more than one map using the same data, the data is requested only once. (See T138739)

Event Timeline

Yurik renamed this task from Define wiki markup syntax for getting OSM shapes to RFC: Define wiki markup syntax for getting OSM shapes.Jun 20 2016, 2:24 AM
Yurik added a project: Proposal.

To help the discussion about data supply format, maybe we should look at how this data gets displayed and the relevancy of mixing it all together VS having several (toggleable) map layers. Is it more relevant to display these external features on the same layer as the inline geojson features, or on individual overlays?

Currently, all groups specified in group="group1,group2,group3" are displayed as separate data layers (= separate overlays). On Wikivoyage this allows the user to toggle each data group (including inline geojson) individually.

So unless we change the current JS code,

  • Option 1-a (As attribute) will create one overlay per group, i.e. one overlay per external geoshape and a separate overlay for the inline geojson.
  • Option 2-a (As geojson) will create one overlay for the entire geojson blob, displaying external geoshape on the same data overlay as the inline geojson.

Or if we make the code manipulate the data :

  • Option 1-b (As attribute): the code merges external wikidata:Qid + inline features in a single geojson object, so that the JS code displays external geoshapes and inline geojson on the same layer.
  • Option 2-b (As geojson): the code pops the external wikidata:Qid geojson blocks out of the inline geojson, so that the JS code displays external geoshapes and inline geojson on separate layers.
  • Option 3 (Universal): the code pops the external wikidata:Qid geojson blocks out of the inline geojson, and merges these wikidata:Qids in a single geojson blob, so that the JS code displays all external geoshapes on a same layer, and inline geojson features on a separate layer.

In my opinion,

  • the way data is specified in wikitext should match how the data is displayed on the map, and vice versa, the way data is displayed on the map should match how the data is specified in wikitext. (currently the paradigm is: inline geojson in one layer, + groups in individual layers). Regardless of the option we choose, I'd like to maintain a human-friendly paradigm.
  • external geoshapes and inline geojson should be displayed on separate layers, thus I am concerned with extending the inline geojson.

@JGirault I agree that wikimarkup should match visualization, but only to some degree. The data storage mechanism (OSM DB vs inline geojson) should not affect what layers it is shown as. The "implementation detail" of how it is stored should be hidden.

So the editor should be free to mix and match different data sources. At this point I see 3 data sources:

  • OSM DB - identified by Wikidata ID
  • GeoJSON pages on Commons (onwiki data storage, similar to my tabular data implementation) - identified by the page name, e.g. Data:Don_Quixote_travels.geojson
  • inline data - GeoJSON right in the article, not identifiable, but, only in Wikivoyage, cross-referable with the "group" attribute.

But if we don't think of the relation between {data source/wikimarkup/layer visualization}, and only focus on the {data source/wikimarkup} for now, later how would we solve requests for displaying data in different layers?

As an editor, I think right after I identified my data sources, I want to define my data layers, say this goes on this layer and that on that layer.

If we treat the data (data source) independently from the view (map layers), then I guess we are trying to be "MVC", where M=Mapdata, V=Mapframe, C=kartographer.js. But this fails because we have data (geojson)[M] living in the <mapframe>[V].

Forgiving my naiveness, a better MVC way could be like:

<mapdata group="cali_loc" name="Locations in California">[ { ... }. { ... }, { ... } ]</mapdata>
<mapdata group="oregon_loc" name="Locations in Oregon">[ { ... } ]</mapdata>
<mapdata group="locations" name="Locations" show="cali_loc, oregon_loc"/>

<mapdata group="california" name="California" wikidata="Q12345"/>
<mapdata group="oregon" name="Oregon" wikidata="Q67890">
<mapdata group="states" name="States" show="california,oregon">

== Locations ==
<mapframe show="states,locations"/>

See only the locations in <maplink text="California" show="california,cali_loc"/> or <maplink text="Oregon" show="oregon,oregon_loc"/>.

^ in this case, we could even use the <mapframe> content block for styling (in a kind-of CSS way) instead of data source.

I have been thinking that putting the inline geojson block in a single layer is a bad assumption. How can users display data in different layers?

Yurik renamed this task from RFC: Define wiki markup syntax for getting OSM shapes to RFC: Define wiki markup for external geo data like OSM shapes.Jun 26 2016, 3:54 PM
Yurik updated the task description. (Show Details)

@JGirault, yes, in a way we are defining something a bit like MVC, but with a number of constrains:

  • We cannot have it in multiple tags - that's what the VE team opposed to the most - they want each tag to be independent, without any cross-dependencies. We actually had the <mapdata> in the first draft, but removed it.
  • The better split between view and data would be to introduce CSS-like classes. This way instead of specifying the style of the object, we simply say "class=bar" in the properties, and it gets drawn properly.
  • Lets not confuse group and layer. Group is an internal structure of the data, and layer is how data gets shown in the layer button. Maybe we should add an extra property called "layer" - which will allow any kind of grouping for the layer button. But this is purely for Wikivoyage. I am actually not even sure we need all these different layers for the user. Or at least we may want a top-level "on-off" button, and a set of sublayers, so that user can turn them on/off all at once.

My first attempt - the format=uri will not work because php does not correctly parse URLs like file:///... (missing host).

"oneOf": [
    { "$ref": "#/definitions/externalData" },
    ...
],

"externalData": {
    "title": "ExternalData",
    "description": "An external data references",
    "required": [ "href" ],
    "properties": {
        "type": { "enum": [ "ExternalData" ] },
        "href": {
            "type": "string",
            "format": "uri"
        },
        "properties": { "$ref": "#/definitions/simplestyle" }
    }
},

Done, implementing...