[RFC] Multi-Content Revisions
Open, NormalPublic

Description

Problem

Storing information within the page revision content, has the following benefits:

  • Part of page history, comparable and reversible together with other related content.
  • Editable by users at the same time as other content, and allows them to make a single atomic change.

We currently do this for categories, infoboxes and template data. But embedding this in wikitext has downsides. While it is possible to extract data via the Parser (as for categories), invoking the Parser has a cost. For that reason, we actually store some of the derived data in link tables, but that is only available for the current revision. The goal of MCR is to allow accessing individual slots of content without the overhead of the parser.

Other data is currently stored outside wikitext, such as template documentation, quality assessment, and more. MCR would allow bringing these into the subject page.

Solution

The idea of this RFC is to allow multiple Content objects to be associated with a single revision. A revision will have multiple slots, and each slot can be occupied by one Content object. The "main" slot being reserved for the primary content of the page (that is, for what is currently considered the content of the page).

For details, see https://www.mediawiki.org/wiki/Requests_for_comment/Multi-Content_Revisions.

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

What I take away from @Alsee's comment is that we should provide a more comprehensive and detailed overview of the use cases.

So the answer is no, no thought of investigating whether the editing community wants this.

That's not what I said. To the contrary, I know that such investigation was done at least for the use case that is my job to take care of, namely structured media info; I also know that separating categories out of the wikitext has been requested and discussed numerous times. What I said is that the status of these investigations and discussions needs to be better documented and linked to from the technical proposal.

The idea of pulling categories, templates, and other things of out the wikitext is a pretty radical change. I understand you have use-case-proposals and the reasons you think they're good ideas. I'm not here to directly debate that. I'm here to alert you to the fact that this is a Big Deal. I am here to alert you that the Community may have a very different perspective, that this may be highly controversial. The proposed use cases may start evaporating if the community considers them unwanted or disruptive.

I agree that it would be a Big Deal to e.g. moved Wikipedia infoboxes out of the wikitext. But please note that this RFC does not propose doing that. It proposes a change to the platform that would allow us to do that -- and more importantly, it would allow other sites to manage infoboxes outside the wikitext.

Of course, if none of the use cases was endorsed by the community (which community?), the proposed change to the platform would be pointless. And you are correct that we need to take care to have the community in the loop when discussing use cases and requirements.

I fear I missed an important point when listing the use cases: I did not make a clear distinction between use cases for which we have consensus for implementing them and use cases for which we see potential, or have had repeated requests, but which have not yet been fully investigated or discussed broadly. That's why I said that we need a more comprehensive and detailed overview of the use cases.

PS: One side note about discussing changes to the editing interface with the community of editors: the editors who are active on the site today are the ones who like (or at least got used to) the current interface - the ones that find the current way to edit unusable have given up after a few tries. We would like to change this, and open the editing experience to people who do not want to fiddle with complex syntax; this may mean changes that some people who have become experts at fiddeling with wikitext don't like. We'll need to find the right balance, but we cannot find it if we listen only to the people who are active editors now. But that has nothing to do with the MCR proposal, it's just a general observation about discussing new features with "the" community.

Pppery added a comment.Oct 8 2016, 9:10 PM

It might be helpful to split the use cases into ones where MCR is nice to have and those which need it. As I understand it, there are roughly three groups:

I'm missing the group "currently embedded in wikitext and would benefit from separate storage, editing, diffing, etc", e.g. page assessment, media info, categories, template schema, translation tables, ...

Why? What is wrong with the page assessment being stored in wikitext? Seperate editors for TemplateData and categories already exist and I see no need to split these out further. This would probably also make it harder to do the semi-common thing of removing an {{uncategorized}} template and replacing it with categories, which would requires going through two editors in this proposal

daniel added a subscriber: Pppery.EditedOct 12 2016, 4:36 AM

Why? What is wrong with the page assessment being stored in wikitext? Seperate editors for TemplateData and categories already exist and I see no need to split these out further.

The people who write specialized editors like that are exactly the ones who want MCR most. Because it's really hard to get this right. For instance, how can you find all category links on a page? You need to know all local aliases for the category namespace, you have to know which tag extensions accept wikitext (<poem> does, but <source> doesn't), and if the category link is in a template parameter, you have no idea whether it is *actually* a category link, or just looks like one.

And if you add a category, we save (and re-render!) a new copy of the entire page, instead of just the bit that changed.

Yes, these tools exist, but they unreliable, hard to maintain, and inefficient. That's exactly how the idea for MCR was born.

Izno added a subscriber: Izno.Oct 12 2016, 1:20 PM

which would requires going through two editors in this proposal

You seem to be visualizing a particular implementation. That's usually bad design.

When I see a potential implementation (for a wikitext solution--never mind VE for right now), I see multiple <textarea>s, each with their own storage of elements. That doesn't require a new window. Or autocomplete-enabled category selection ("here, add tags for this article!") without ever exposing the wikitext syntax to the user. (Right now, I need a gadget for that.) Or forms for page assessment (regarding which, I have no doubt T120219: PageAssessments deployment to WMF wikis would be enabled by MCR).

daniel added a comment.Nov 1 2016, 5:03 PM

I have proposed T149532: Why Multi-Content-Revisions? Use cases and requirements. as a session for the #Wikimedia-Developer-Summit_(2017). If you are interested in such a discussion at the summit, please comment on the ticket.

Question : History of old articles

If I understand correctly, this feature will potentially allow to view an article with the versions of the templates that existed at the time the wikitext was edited. Two questions arise then :

  • will that also work for deleted templates ?
  • will we be able to restore the revisions of version prior to multiple content revision deployment, say a 2005 revison of some article ?

@TomT0m No, Multi-Content-Revisions does not help with consistent display of old template revisions. Well, it does in cases where the use of templates is replaced by the use of slots - if e.g. template documentation was stored in a slot instead of a subpage, you would always see the correct version of the documentation for old versions of the template. But that would be because it would no longer use the template mechanism.

@TomT0m No, Multi-Content-Revisions does not help with consistent display of old template revisions. Well, it does in cases where the use of templates is replaced by the use of slots - if e.g. template documentation was stored in a slot instead of a subpage, you would always see the correct version of the documentation for old versions of the template. But that would be because it would no longer use the template mechanism.

Ok, I got confused. Does that mean that the documentation will not have its wikipage address anymore ?

Would this then be possible to have a special type of "reference" slot which would hold a pointer to another page revision ? I guess the parser could be modified to maintain those reference slots when page are saved.

For example the parser computes a new version of the page when its content is modified, and when he expands a template a hook triggers the slot mangager to store the revision number of the template with those "reference" slots - I guess this this kind of hooks or something similar exists since we got a list of the used templates on previsualisation of a page.

Tgr added a comment.Nov 17 2016, 8:09 AM

If I understand correctly, this feature will potentially allow to view an article with the versions of the templates that existed at the time the wikitext was edited.

You might be thinking of Memento (which is not related to this in any way).

daniel added a comment.EditedNov 19 2016, 3:23 PM

Ok, I got confused. Does that mean that the documentation will not have its wikipage address anymore ?

Yes, the documentation would be part of the template page proper, and would not have a separate title.

Would this then be possible to have a special type of "reference" slot which would hold a pointer to another page revision ? I guess the parser could be modified to maintain those reference slots when page are saved.

That would theoretically possible, but there are currently no plans to do this. I'm also not sure this would be the best way to tie a page revision to template revisions. So far, slots are intended to be editable, not derived. I have been thinking about derived slots, but the use cases for that idea all seem a bit contrieved, and would perhaps be better served by a more specialized solution, like a dedicated database table.

For example the parser computes a new version of the page when its content is modified, and when he expands a template a hook triggers the slot mangager to store the revision number of the template with those "reference" slots - I guess this this kind of hooks or something similar exists since we got a list of the used templates on previsualisation of a page.

This could be done with a DB table that associated a revision ID of the "transcluder" with a revision ID of the "transcluded" in each row. Simple enough to do, and would be stable against moving the template being renamed, etc. It's going to be a big table, though. And quite a change in how things work. As Tgr pointed out, there is the Memento extension that does this with some limitation. It's a feature that has been discussed time and time again, but never gained enough traction to be properly implemented.

Just for clarity, as I've worked on this task but not actually commented, we in Editing see MCR as very important to our long-term plans. The use cases laid out at Multi-Content Revisions#Use Cases cover a lot, but I'll just pull out the four that we see as most vital:

  • The structured media info work, as almost goes without saying;
  • Rejigging templates to have dedicated template, styling, data, and documentation slots, with UI to match;
  • Rejigging files to have a fused history for the blob and the description, removing UI confusion; and
  • Moving to a structured-data approach for categories.

Lots of others are also important, but those are the most useful.

daniel moved this task from Inbox to Project on the User-Daniel board.Jan 5 2017, 7:03 PM
cscott added a comment.EditedJan 11 2017, 8:13 PM

If we use MCR for annotation storage, it would be useful to have a canonical URL for the contents of a specific slot. That might be an API URL, like https://en.wikipedia.org/api/rest_v1/page/html/Main_Page/749836961/<slot number> or else a user-visible URL like https://en.wikipedia.org/wiki/Main_Page/<slot name> or https://en.wikipedia.org/wiki/<Slot>:Main_Page or even a quasi-API URL like https://en.wikipedia.org/wiki/Special:redirect/slot/<revision>/<slotname>. Thoughts?

(cc @MarkTraceur)

Addshore removed a subscriber: Addshore.Apr 3 2017, 9:52 AM
Deskana added a subscriber: Deskana.Jun 7 2017, 2:09 PM
-jem- added a subscriber: -jem-.Jun 23 2017, 10:51 AM
Ayack added a subscriber: Ayack.Jul 6 2017, 7:05 PM
Rical added a subscriber: Rical.Jul 16 2017, 2:04 PM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 1 2017, 10:28 AM
Nirmos added a subscriber: Nirmos.Oct 11 2017, 2:34 PM
daniel moved this task from Inbox to Epic on the Multi-Content-Revisions board.
Smalyshev removed a subscriber: Smalyshev.
Krinkle updated the task description. (Show Details)Jan 10 2018, 5:30 PM
Krinkle updated the task description. (Show Details)Jan 10 2018, 5:35 PM
Krinkle updated the task description. (Show Details)Jan 10 2018, 5:37 PM
Rical added a comment.Jan 10 2018, 8:20 PM

As assigned to T135845, I would use dedicated pages to exchange structured datas between several central or local Lua modules.
These pages could content:

  • the first and actual versions of several modules
  • the story of mediawiki versions in each wiki to help coders to better describe new bugs
  • the options to manage these exchanges: central or local, priorities of some modules as managers, used structures...
  • the places where find i18n translations...
Pppery removed a subscriber: Pppery.Mon, Oct 1, 6:55 PM