Page MenuHomePhabricator

Add a data-page-only wiki markup header to datasets
Open, MediumPublic

Assigned To
None
Authored By
Yurik
Jan 14 2017, 4:59 AM
Referenced Files
None
Tokens
"Love" token, awarded by John_Cummings."Doubloon" token, awarded by Liuxinyu970226."Doubloon" token, awarded by IKhitron."Doubloon" token, awarded by Fae."Doubloon" token, awarded by Jeff_G.

Description

We urgently need a way for the community to add wiki markup to the top of the dataset pages. That markup will allow for messages, categories, deletion requests, etc.

Community discussion permalink (more messages might have been added later)

Wiki markup will not be accessible via api or via lua calls - that field will be removed from the data results.

Proposed data structure:

{
   "info": "  any wiki markup   "
}

Will be shown at the top, right after the "description" tag (or should it be above?)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Gehel, not exactly. The new wiki header field would apply to all data stores, both .tab & .map, because it should be implemented in its base class (they share one parent). It would use the current main page parser - thus parsing in the context of the whole page, rather than create a new parser instance and discarding the "side-effects" - such as categories, link tracking, etc. The reason I mentioned the .map title & description fields is because they use a very similar approach, thus showing that it is doable. They just use a new parser instance IIRC, without tracking things.

@debt not sure how I can explain, I would have to actually do it. I will try to find some time, but no ETA. Also, please check with @MaxSem - he knows this area pretty well.

@Fae, correct, you cannot add any "side-effect-causing" markup to the data pages, only to the data talk pages. And I agree, the workaround is not ideal.

A new Wikimedia Commons proposal has been created to allow for additional licenses for Data files. This would reduce the confusion about whether data imported from elsewhere needs attribution or can be redefined as CC0.

An obvious consequence if the proposal passes, is that the license must be able to be added to the Data file by any user, and displayed with the map or table.

Link: https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals#Proposal_to_include_non-CC0_licenses_for_the_Data_namespace

IKhitron subscribed.

I really need a way to add category to json pages.

From my perspective, putting wiki markup into JSON structures seems rather horrible.

However, with Multi-Content-Revisions (MCR), it will become possible to have a wikitext part ("slot" in MCR jargon) and a data part of the page co-exist, each with its own separate editor, but with a shared history etc. Having a "description" slot on data pages seems like a valid request.

For categories, I see three options:

  1. have a "categories" field in the JSON
  2. have a separate slot for categories (which could also be used with other kinds of content, e.g. Lua modules)
  3. have a "description" slot, and put categories there.

These options arn't exclusive, technically nothing keeps us from allowing all three. I just feat that it may be confusing to have three places where categories may be defined. On the other hand, this isn't really worse than categories being imposed by templates.

  1. have a "categories" field in the JSON

I saw that there is a plan to add /**/ comments to JSON in wiki, so it can be "4".

@daniel while i do agree with you in principle, it might be a while to implement. Adding a single field to JSON and passing it through parser is about 5 lines of code, and should take at most an hour of a skilled dev. Also, I wouldn't separate categories from the wiki markup here, simply because most of the time you want templates with categories to auto-add stuff, rather than each page having individual category fields. Also, this method does not preclude future migration to the multi-part system - rather it will be very straightforward.

@IKhitron putting stuff into comments is horrible - not very reliable at parsing, gets easily overwritten by accident or by automatic tools, etc. JSON is just not safe with them (sadly).

@IKhitron putting stuff into comments is horrible - not very reliable at parsing, gets easily overwritten by accident or by automatic tools, etc. JSON is just not safe with them (sadly).

We do it this way in js and css.

We do it this way in js and css.

Yes, it's horrible :)

We do it this way in js and css.

json is different - it gets parsed into data and back during the save. js & css are stored "as is" - just like wiki markup. For example, when saving, JSON data will loose all space formatting.

You can't have comments in JSON, and they'll error out several JSON parsers. If you want any interoperability and require comments, use a different format. There's a format similar to JSON which extends it to allow JS-style comments.

Ultimately, JSON is not a format designed for good human editability, so if comments are important, consider a different format.

have a "description" slot, and put categories there.

I think having a separate description slot, where you could put categories, templates, bot instructions, human instructions, etc. would be the best solution. Trying to make JSON into what it's not meant for is far inferior.

... if comments are important, consider a different format.

Not at all, for me.

I think having a separate description slot, where you could put categories, templates, bot instructions, human instructions, etc. would be the best solution. Trying to make JSON into what it's not meant for is far inferior.

Stas, I agree with you - this is the same as what Daniel proposed above. The fundamental problem is resourcing. It seems WMF has no resources to maintain many of these projects, in which case the MVP is the only path forward to solve the immediate problem. Given unlimited resources/time, a proper multi-slot system is more desirable. Unlike regural wiki pages, the good thing about JSON is that it will be trivial to implement the simple solution first, and let community actually move forward, than implement the proper long term solution and do a simple migration to the multi-slot version.

I really need a way to add category to json pages.

Can I take a step back for a moment, seeing as the discussion is about the specifics of whether or not to implement, and ask -- @IKhitron can you describe what, exactly, you need, and why do you need to put categories on json pages? What are you trying to do? What is missing?

It might be that understanding the actual need will help us find a solution -- whether the specific one described/asked for, or, potentially, a new and/or better one. Seeing as we're talking about a specific need to categorize JSON files, I'm wondering if you can explain it further, @IKhitron ?

@IKhitron can you describe what, exactly, you need, and why do you need to put categories on json pages? What are you trying to do? What is missing?

Hi. Sure, it's very simple. We need to eliminate them from special:templates needs category.

have a "description" slot, and put categories there.

I think having a separate description slot, where you could put categories, templates, bot instructions, human instructions, etc. would be the best solution. Trying to make JSON into what it's not meant for is far inferior.

Has anybody talked to the people working on StructuredData for Commons? Their stuff may make a lot of this obsolete sooner or later …

Has anybody talked to the people working on StructuredData for Commons? Their stuff may make a lot of this obsolete sooner or later …

My understanding is that what is requested here is the opposite of what SDoC does. SDoC allows structured machine readable meta-data to be stored on file description pages, in addition to wikitext. This here ticket asks for a way to store wikitext along with the structured machine readable data on data pages.

The overlap I see is "storing two different kinds of content on the same page". This can be done with the new MCR infrastructure in core (which enables SDoC, but isn't really part of it). This would mean that the wikitext goes into a separate "slot", instead of being part of the JSON. I think that would be the correct approach, and very similar to other use cases targeted by MCR, such as documentation for templates and Lua modules.

I'm very happy to help with any documentation changes needed for this to progress. I've written some instructions for using map data in Wikidata and I'm waiting for the licenses to be fixed so I can publish it https://www.wikidata.org/wiki/User:John_Cummings/Map_data

As a reminder, this task has been open for 2 years with a more detailed Wikimedia Commons community consensus to go ahead 15 months ago. https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals/Archive/2017/10#Proposal_to_include_non-CC0_licenses_for_the_Data_namespace

If deferring changes important to the community to Phabricator is the only way we can get things done, it is broken. Volunteers do not wait forever, we come and go. Leave any change long enough, and the momentum is gone, folks will be doing other things and every time this happens the long term members of the community get a little more jaded and pessimistic about the future.

I think the issue is that it's currently unclear who owns the code in question.Pinging the Multimedia team - is this yours? Pinging Community-Tech - can you help out?

Unrelated side note: I would personally love to see datasets closely integrated with Wikibase. But I don't think that's on anyone's roadmap at the moment.

MSantos subscribed.

@daniel the team responsible for Maps maintenance is the Reading Infrastructure. I am tagging it so we can evaluate this task.

@MSantos but this does not have much to do with maps at all...

@daniel this is needed to allow any non CC0 map data to be imported, e.g OpenStreetMap

@Mrjohncummings Right, sorry, I wasn't clear: the code begin this is Extension:JsonConfig, which knows nothing about maps, and has nothing to do with the maps code. The data can of course be used in maps, and can represent coordinates, geo-shapes, etc.

But I see now that JsonConfig is also owned by ReadingInfrastructure, so ignore me :)

@daniel I'm glad someone understands how all this stuff works :)

Jhernandez subscribed.

@MSantos Moving to needs analysis. Please have a look when you can and update the description like the template to prioritize it better. Thank you!

@Jhernandez @MSantos is there anything I can do to help move this along? I'm stuck on my work for Wikidata till this gets addressed. Should this be assigned to someone in particular?

@Mrjohncummings, I am not sure yet how I can help you with this case, I read the long discussion at T154071: Allow non-CC0 licensed data for datasets and I am now trying to understand the technical aspects of this issue. I need to point out that we are not planning active development on JsonConfig, but will be available to support any volunteer work, as mentioned in T154071#4323571.

This seems to be an important feature with no consensus on architecture design yet, let's chat so I can understand better what are your needs. My IRC is mateusbs17.


@Jhernandez, thanks I will update the description as soon I have more info.

@MSantos thanks for your reply, I don't use IRC, I'll send you an email today

Best

@Lydia_Pintscher @johl just so you're aware this is happening, this is the blocker for adding non CC0 datasets to Commons that can be used by the query service. I have documentation written ready to go for people to use Commons data files in Wikidata queries (e.g map shape files from OpenStreetMap). If you could ask in your network to find a volunteer (or staff member) to help move this along that would be super

@Mrjohncummings @MSantos have any clearer directions for development come out of your February discussion?

Looking at this task thus far, and seeing what is already implemented on Commons, a Multi-Content Revisions approach (with a slot for the accompanying wiki markup) sounds like the best solution IMO. I'm guessing this would only require a mechanism in JsonConfig to handle multiple slots?

Change 511088 had a related patch set uploaded (by MSantos; owner: MSantos):
[mediawiki/extensions/JsonConfig@master] Allow wikitext in description for Data namespace

https://gerrit.wikimedia.org/r/511088

Nforcer7 changed the task status from Declined to Invalid.Aug 1 2019, 5:19 PM

Are we any further forward with this?

I see a rough-fix patch above, that nobody seems to have been prepared to review; also a proposal to have a proper Commons description page, allowing detailed source, usage, maintenance, warning templates, category information etc to be added in the usual way.

To that might be added, that the data pages should also have a structured data slot (new ticket: T235332), so that they can be described and made discoverable using SDC properties and statements, and the SDC query system.

It is important to be able to describe the metadata of data fully and accessibly. Why is this ticket getting no action?

I suggest we adopt approach similar to Lua modules - for example: https://en.wikipedia.org/wiki/Module:Message_box is Lua page, and https://en.wikipedia.org/wiki/Module:Message_box/doc is a wikitext page describing it or allowing users to nominate it for deletion.
Each dataset will have (optionally) a description page allowing to add categorization, additional information in non machine readable-wikitext format.

As an aside, it looks like the sources section also accepts wiki-markup, but currently throws away any categories instead of including them in the page.