Page MenuHomePhabricator

Allow wiki markup data type
Open, Needs TriagePublic

Description

Tabular data should support wiki markup as an alternative to a regular string.

Use cases:

  • A table column contains an image from commons that needs to be shown in a graph as a thumbnail (e.g. expensive paintings). To implement it, we could:
    • Store wikidata's image Q number (T134657). This allows a proper linking and other use cases, but does not allow easy conversion, e.g. "get the URL of the 200px image thumb", because the conversion would be happening on the client. Converting each value one by one via api parse calls seems excessive.
    • Allow wiki markup column type. This is not great because data will contain presentation-related data (url of the 200px image is not really data itself, its the transformation of the original image url), but will allow per row customization, e.g. "this image should be 250px because of its shape. Parsing all values may be a very costly operation, so it should probably be memcached.

Naming (bikeshedding) the data types:

  • Regular string: wiki, wikistring, wikimarkup
  • Localized string: wkilocalized, langwiki, localizedwiki

Event Timeline

Yurik created this task.May 7 2016, 5:04 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 7 2016, 5:04 PM

I don't think it's necessary to define such type in Data.

Have rather Graph define, that such column in Data is datatype "file" where the actual data is plaintext string of filename without File: namespace prefix and other presentational data are stored in other parameters of such Graph.

Same with definition if string (blob) is supposed to be plaintext (=<nowiki>) or richtext (= MW parsed and converted to HTML).

Let's stick to known basic portable scalar datatypes in Data - boolean, number, string... and have all the presentation logic outside - in Graph or Module or any other presentational layer...

Yurik added a comment.EditedMay 7 2016, 5:55 PM

@Danny_B I was thinking the same thing, but for Graphs use case we need to figure out how to convert the filename into a URL. This graph uses wiki markup to get thumbnail:

"wikirawupload:{{filepath:Vincent Willem van Gogh 127.jpg|190}}" => "wikirawupload://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Vincent_Willem_van_Gogh_127.jpg/190px-Vincent_Willem_van_Gogh_127.jpg"

This way Vega gets pre-constructed dataset with all resolved URLs and draws graph on the client. We could of course add some custom wiki-specific Vega transformations, like "given a string, pass it to the parse api on the server", but that will be slow, might overwhelm the servers (100 images per graph, each being a separate API call) and might require a significant dev effort because the transformation is async.

We could try to come up with some magical query language to request data with transformation. E.g.

wikitabular:///MyData.tab -- gets entire content as [ {hdr1:val1_1, hdr2:val1_2}, {hdr1:val2_1, hdr2: val2_2}, ...]
wikitabular:///MyData.tab?hdr1={{{hdr1}}}&hdr2={{{hdr2}}} -- identical to the above, except that all values are now strings.

In other words, use URL's query to pass in wiki markup that could perform any transformation. Or we could even go further and allow full SQL-like language, e.g.:

wikitabquery:///?q=SELECT WIKI("wikirawupload:{{filepath:{{image}}}|190}}") AS imgurl, ... FROM [MyData.tab] WHERE age > 1000

This way we can eventually move towards large database storages, with proper SQL backend. I think @brion wanted something like this :)

Yurik added a subscriber: brion.May 7 2016, 9:44 PM