Page MenuHomePhabricator

[RFC] RDF mapping for geo-shape / URIs for commons data pages
Closed, DuplicatePublic

Description

We have implemented geo-shapes as a new property type that stores a reference to a page on commons in the data namespace.
For example:
https://commons.wikimedia.org/wiki/Data:Ecos.fws.gov/Endangered_habitat_58938/Phyllostegia_mollis.map

We now need a way to model it in RDF to implement the RDF export.
What would be a good solution and why?

See also the relevant design document.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

As it is now, I think the URL would be the only option for RDF. I don't see how we could plausibly put anything else in RDF.

Smalyshev triaged this task as Medium priority.Mar 6 2017, 8:53 PM
daniel subscribed.

@Smalyshev which URL? The URL of the page? But that doesn't provide machine readable output. We'd want the raw data. Do we even have have a good URL for the raw data?

This URL will return the actual shape data: https://commons.wikimedia.org/w/index.php?title=Data:Avignon_City_Wall.map&action=raw

Very ugly. Is it stable? We should set up a rewrite for this. A nice canonical URI would be https://commons.wikimedia.org/data/Data:Avignon_City_Wall.map. Content negotiation would be nice, but not a must-have.

The URL of the page?

Yes.

But that doesn't provide machine readable output. We'd want the raw data.

Importing commons data into RDF may be pretty slow. And make RDF database hold huge blobs (that Avignon one is 20K+, and probably there are even bigger), which would probably be rather inefficient. And these blobs would be pretty useless, since you can't do much with them unless you have code that specifically knows how to parse this specific type of blobs - in which case it's probably not hard for the same code to also download the blob from the URL. So I'm not sure what is the advantage of copying these huge blobs into the RDF database.

@Smalyshev I'm not proposing to import the shape data into WDQS. But WDQS should link to the machine readable shape data, not the HTML page. If all I have is the URL of the HTML page, there is no obvious way to get the blob.

In RDF terms: we want a nice URI for the shape, which should resolve to a URL for the shape data, not the URL of an HTML page about the shape data.

We do the same for commonsMedia: we do not link to the file description page, but to Special:FilePath/whatever, because that resolves to the actual media blob.

@daniel Sorry, I misunderstood you.

But WDQS should link to the machine readable shape data, not the HTML page

Yes, completely agree, the link URL should point to a machine-readable representation of the data in a standard format (or at the very least content-negotiated URL that can return one).

daniel updated the task description. (Show Details)
daniel renamed this task from [RFC] RDF mapping for geo-shape to [RFC] RDF mapping for geo-shape / URIs for commons data pages.Mar 22 2017, 7:54 PM