Add a data-page-only wiki markup header to datasets
Open, HighPublic

Tokens
"Doubloon" token, awarded by Liuxinyu970226."Doubloon" token, awarded by IKhitron."Doubloon" token, awarded by Fae."Doubloon" token, awarded by Jeff_G.
Assigned To
None
Authored By
Yurik, Jan 14 2017

Description

We urgently need a way for the community to add wiki markup to the top of the dataset pages. That markup will allow for messages, categories, deletion requests, etc.

Community discussion permalink (more messages might have been added later)

Wiki markup will not be accessible via api or via lua calls - that field will be removed from the data results.

Proposed data structure:

{
   "info": "  any wiki markup   "
}

Will be shown at the top, right after the "description" tag (or should it be above?)

Yurik created this task.Jan 14 2017, 4:59 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 14 2017, 4:59 AM
FDMS added a subscriber: FDMS.Jan 14 2017, 8:06 PM

My concern: how is this different from "description" which is already displayed at the top of data pages?

My concern: how is this different from "description" which is already displayed at the top of data pages?

Any wikitext put in to "description" is not parsed, so I think that's the use case here, to allow things such as deletion templates to be added to the pages.

There is a workaround here, which is to use the talk page. It's not ideal, but it might be enough.

FDMS added a comment.Jan 25 2017, 4:56 AM

On the English Wikipedia there appears to be some sort of pseudo-parser for JS/CSS pages, shouldn't such an approach work for datasets as well?

debt added a comment.Jan 26 2017, 5:17 PM

On the English Wikipedia there appears to be some sort of pseudo-parser for JS/CSS pages, shouldn't such an approach work for datasets as well?

If this workaround is good, I think we can move this ticket to the 'nice to have' section on T155601

I think the use case is putting various templates - like templates that may control bots, or templates that provide some functionality, or descriptions that need more capabilities than plain text in "description", like links.

Using talk page may be an option, but talk page is not immediately visible to the user (many users are not aware they exist even). I'm not sure what is the solution proposed by @FDMS since the page linked describes that speedy deletion template does not for for JS/CSS but I don't see any actual solution for this problem described there.

debt added a project: Maps.Oct 12 2017, 7:19 PM

There is a new discussion going on now, regarding the substance of this ticket:
https://commons.wikimedia.org/wiki/Commons:Deletion_requests/Data_talk:Kuala_Lumpur_Districts.map

Restricted Application added a project: Discovery. · View Herald TranscriptOct 12 2017, 7:19 PM

Trying to parse JSON as Mediawiki markup seems like a bad idea. Escaping rules will mean that you still need to parse the JSON as JSON.

e.g.

{
   "info": "  any wiki markup   "
}

and

{"info":"\u0020\u0020\u0061\u006e\u0079\u0020\u0077\u0069\u006b\u0069\u0020\u006d\u0061\u0072\u006b\u0075\u0070\u0020\u0020\u0020"}

are identical JSON files, and need to lead to identical results.

This is not simply a theoretical issue - quotes and newlines are common, and are required to be escaped.

The right way to do this is to properly separate metadata and data, so the wikimarkup is stored side by side. As Commons-Datasets is orphaned with no team responsible, this is likely too difficult.

Another solution would be to parse the JSON then pull out a specific field, then pass that field to a mediawiki markup parser. This doesn't mean the raw text representation of the json will be searchable. The field would have to be removed when something requests the file, otherwise it might interfere. I'd also suggest a name less likely to collide than "info"

A third option would be to strip out <noinclude> or similar sections when requesting the json.

@Pnorman actually we already use Wiki markup in these pages - the .map pages treat title and description fields as wiki markup. I haven't heard of any problems -- the values are being sanitized by the regular MW parser, and gets consumed by mapframe/maplink/lua code. Adding this field wouldn't be much of a challenge from the tech perspective.

Jeff_G added a comment.EditedOct 16 2017, 3:39 AM

@Pnorman actually we already use Wiki markup in these pages - the .map pages treat title and description fields as wiki markup. I haven't heard of any problems -- the values are being sanitized by the regular MW parser, and gets consumed by mapframe/maplink/lua code. Adding this field wouldn't be much of a challenge from the tech perspective.

When I tried to add the full delete template as a title field, I got "Syntax error". When I tried to add it to the description field, I got "⧼Parameter "description" must be an object that maps valid language codes to single line strings without tabs or trailing spaces, e.g. { "en":"String in English", ... }⧽". Admittedly, the full delete template has multiple lines. When I tried to add "{{delete|reason=No room for that here, please see the subpage.|subpage=Data talk:Kuala Lumpur Districts.map|year=2017|month=October|day=12}}" as the the title, I got another "Syntax error"; when I added it to the description, it was rendered as if it was nowiki'd. When I tried to add "Please see [[Commons:Deletion requests/Data talk:Kuala Lumpur Districts.map]]." as the title, I got another "Syntax error"; when I added it to the description, it was again rendered as if it was nowiki'd (unlike edit summaries, which render wikilinks). All of these attempts were with preview, I didn't try actually saving anything because the preview always failed.

Yurik added a comment.Oct 16 2017, 3:41 AM

Clarification - I think (need to check in the code), the title and description use "limited" wiki syntax, similar to what is used in the edit comments. A full wiki markup parsing would be needed to track categories, etc.

Yurik added a comment.Oct 16 2017, 3:42 AM

BTW, IIRC, this fix would actually be just a few lines of code.

debt added a comment.Oct 16 2017, 3:24 PM

@Yurik - could you expand on how this could be fixed with 'a few lines of code'?

Gehel added a subscriber: Gehel.Oct 16 2017, 3:49 PM

I don't know much about Commons Datasets, so the questions below might be naive... Feel free to ignore.

If I understand correctly, commons dataset is a way to store arbitrary JSON on commons. Not only maps / geojson. In this case, we take fields that are geojson specific (title / description) and interpret them as metadata / wiki markup. Going from "limited" markup to full markup would solve at least part of the problem here, but would only be a solution for .map / geojson? Right?

Again if I understand correctly, there is no easy solution for the generic case. And the question of discoverability / indexing is also not solved here (yes, different issue, I know).

Fae added a comment.Oct 17 2017, 6:08 PM

I had not caught on that as well as templates, it's not possible to add data files to categories (unless I'm missing a way to do it). Again an unsatisfying workaround is to use Data talk pages, with a current example being the maintenance category: https://commons.wikimedia.org/wiki/Category:Data_files_with_Open_Street_Map_coordinates.

Fae awarded a token.Oct 17 2017, 6:12 PM
Yurik added a comment.Oct 17 2017, 7:02 PM

@Gehel, not exactly. The new wiki header field would apply to all data stores, both .tab & .map, because it should be implemented in its base class (they share one parent). It would use the current main page parser - thus parsing in the context of the whole page, rather than create a new parser instance and discarding the "side-effects" - such as categories, link tracking, etc. The reason I mentioned the .map title & description fields is because they use a very similar approach, thus showing that it is doable. They just use a new parser instance IIRC, without tracking things.

@debt not sure how I can explain, I would have to actually do it. I will try to find some time, but no ETA. Also, please check with @MaxSem - he knows this area pretty well.

@Fae, correct, you cannot add any "side-effect-causing" markup to the data pages, only to the data talk pages. And I agree, the workaround is not ideal.

Fae added a comment.Oct 19 2017, 12:23 PM

A new Wikimedia Commons proposal has been created to allow for additional licenses for Data files. This would reduce the confusion about whether data imported from elsewhere needs attribution or can be redefined as CC0.

An obvious consequence if the proposal passes, is that the license must be able to be added to the Data file by any user, and displayed with the map or table.

Link: https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals#Proposal_to_include_non-CC0_licenses_for_the_Data_namespace

IKhitron added a subscriber: IKhitron.

I really need a way to add category to json pages.

daniel added a subscriber: daniel.May 7 2018, 2:42 PM

From my perspective, putting wiki markup into JSON structures seems rather horrible.

However, with Multi-Content-Revisions (MCR), it will become possible to have a wikitext part ("slot" in MCR jargon) and a data part of the page co-exist, each with its own separate editor, but with a shared history etc. Having a "description" slot on data pages seems like a valid request.

For categories, I see three options:

  1. have a "categories" field in the JSON
  2. have a separate slot for categories (which could also be used with other kinds of content, e.g. Lua modules)
  3. have a "description" slot, and put categories there.

These options arn't exclusive, technically nothing keeps us from allowing all three. I just feat that it may be confusing to have three places where categories may be defined. On the other hand, this isn't really worse than categories being imposed by templates.

  1. have a "categories" field in the JSON

I saw that there is a plan to add /**/ comments to JSON in wiki, so it can be "4".

Yurik added a comment.May 7 2018, 3:06 PM

@daniel while i do agree with you in principle, it might be a while to implement. Adding a single field to JSON and passing it through parser is about 5 lines of code, and should take at most an hour of a skilled dev. Also, I wouldn't separate categories from the wiki markup here, simply because most of the time you want templates with categories to auto-add stuff, rather than each page having individual category fields. Also, this method does not preclude future migration to the multi-part system - rather it will be very straightforward.

@IKhitron putting stuff into comments is horrible - not very reliable at parsing, gets easily overwritten by accident or by automatic tools, etc. JSON is just not safe with them (sadly).

@IKhitron putting stuff into comments is horrible - not very reliable at parsing, gets easily overwritten by accident or by automatic tools, etc. JSON is just not safe with them (sadly).

We do it this way in js and css.

daniel added a comment.May 7 2018, 4:52 PM

We do it this way in js and css.

Yes, it's horrible :)

Yurik added a comment.May 7 2018, 6:57 PM

We do it this way in js and css.

json is different - it gets parsed into data and back during the save. js & css are stored "as is" - just like wiki markup. For example, when saving, JSON data will loose all space formatting.

You can't have comments in JSON, and they'll error out several JSON parsers. If you want any interoperability and require comments, use a different format. There's a format similar to JSON which extends it to allow JS-style comments.

Ultimately, JSON is not a format designed for good human editability, so if comments are important, consider a different format.

have a "description" slot, and put categories there.

I think having a separate description slot, where you could put categories, templates, bot instructions, human instructions, etc. would be the best solution. Trying to make JSON into what it's not meant for is far inferior.

... if comments are important, consider a different format.

Not at all, for me.

Yurik added a comment.EditedMay 7 2018, 8:38 PM

I think having a separate description slot, where you could put categories, templates, bot instructions, human instructions, etc. would be the best solution. Trying to make JSON into what it's not meant for is far inferior.

Stas, I agree with you - this is the same as what Daniel proposed above. The fundamental problem is resourcing. It seems WMF has no resources to maintain many of these projects, in which case the MVP is the only path forward to solve the immediate problem. Given unlimited resources/time, a proper multi-slot system is more desirable. Unlike regural wiki pages, the good thing about JSON is that it will be trivial to implement the simple solution first, and let community actually move forward, than implement the proper long term solution and do a simple migration to the multi-slot version.

I really need a way to add category to json pages.

Can I take a step back for a moment, seeing as the discussion is about the specifics of whether or not to implement, and ask -- @IKhitron can you describe what, exactly, you need, and why do you need to put categories on json pages? What are you trying to do? What is missing?

It might be that understanding the actual need will help us find a solution -- whether the specific one described/asked for, or, potentially, a new and/or better one. Seeing as we're talking about a specific need to categorize JSON files, I'm wondering if you can explain it further, @IKhitron ?

@IKhitron can you describe what, exactly, you need, and why do you need to put categories on json pages? What are you trying to do? What is missing?

Hi. Sure, it's very simple. We need to eliminate them from special:templates needs category.

Ltrlg added a subscriber: Ltrlg.Jun 13 2018, 12:08 PM

have a "description" slot, and put categories there.

I think having a separate description slot, where you could put categories, templates, bot instructions, human instructions, etc. would be the best solution. Trying to make JSON into what it's not meant for is far inferior.

Has anybody talked to the people working on StructuredData for Commons? Their stuff may make a lot of this obsolete sooner or later …

daniel added a comment.Aug 8 2018, 8:33 AM

Has anybody talked to the people working on StructuredData for Commons? Their stuff may make a lot of this obsolete sooner or later …

My understanding is that what is requested here is the opposite of what SDoC does. SDoC allows structured machine readable meta-data to be stored on file description pages, in addition to wikitext. This here ticket asks for a way to store wikitext along with the structured machine readable data on data pages.

The overlap I see is "storing two different kinds of content on the same page". This can be done with the new MCR infrastructure in core (which enables SDoC, but isn't really part of it). This would mean that the wikitext goes into a separate "slot", instead of being part of the JSON. I think that would be the correct approach, and very similar to other use cases targeted by MCR, such as documentation for templates and Lua modules.

Nikki added a subscriber: Nikki.Aug 13 2018, 11:34 AM