[RFC] Multi-Content Revisions
Open, NormalPublic

Description

Problem

Storing information within the page revision content, has the following benefits:

  • Part of page history, comparable and reversible together with other related content.
  • Editable by users at the same time as other content, and allows them to make a single atomic change.

We currently do this for categories, infoboxes and template data. But embedding this in wikitext has downsides. While it is possible to extract data via the Parser (as for categories), invoking the Parser has a cost. For that reason, we actually store some of the derived data in link tables, but that is only available for the current revision. The goal of MCR is to allow accessing individual slots of content without the overhead of the parser.

Other data is currently stored outside wikitext, such as template documentation, quality assessment, and more. MCR would allow bringing these into the subject page.

Solution

The idea of this RFC is to allow multiple Content objects to be associated with a single revision. A revision will have multiple slots, and each slot can be occupied by one Content object. The "main" slot being reserved for the primary content of the page (that is, for what is currently considered the content of the page).

For details, see https://www.mediawiki.org/wiki/Requests_for_comment/Multi-Content_Revisions.

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

I think little of that complexity should be exposed to users. We probably don't want editors to freely mix and match slots - rather, we want an integrated experience for editing and display. Ideally editors should neither know nor care about slots.

I think I agree with you, but you say this in a way that sounds dangerous.

The risk: the more that our data formats become a complex mystery that is only understood by a handful of people, the fewer people that will trust the systems we produce.

It's true that ideally, editors should not need to understand the underlying formats. We should create systems that are easy for both humans and computers to understand and manipulate. If we do this, we'll provide the ability to create user interfaces that behave intuitively. Advanced editors will learn the underlying model, and will be able to intuitively grasp the nature of the inevitable problems we'll have with the systems we build. They will also understand how to explain those problems to less advanced editors.

However, the more we try to hide the underlying storage format, the less that the most active editors will trust the systems we produce. Let's make sure that we come up with a system that is easy to explain what a revision is at the byte level.

Tgr added a comment.Sep 27 2016, 4:05 AM

I think little of that complexity should be exposed to users. We probably don't want editors to freely mix and match slots - rather, we want an integrated experience for editing and display. Ideally editors should neither know nor care about slots.

That probably works for editors but not for patrollers. Ie. we can keep the editing interface as it is (there would have to be a non-JS fallback with a textfield for each slot, but it does not have to be the default, even for non-JS users), but history will need some changes (it has to expose edits which do not change the main content, and probably add some filtering tools to handle that) and the diff view will have to expose the slots. That might be worth a discussion.

The risk: the more that our data formats become a complex mystery that is only understood by a handful of people, the fewer people that will trust the systems we produce.

Ah, yes, I agree. The structure of our content should be clearly defined and easy to grasp for interested people. That structure will become slightly more complex with MCR, since we add a level of indirection. On the plus side, the data formats used to represent things like categories or page assessments or license information will become a lot clearer and easier to understand and re-use.

However, the more we try to hide the underlying storage format, the less that the most active editors will trust the systems we produce. Let's make sure that we come up with a system that is easy to explain what a revision is at the byte level.

Yes, right - the system needs to remain transparent, and that's how MCR is designed. My point was that it should not be necessary for editing to know about this. People add tags on flickr without having to think about the underlying storage structure, or learn arcane syntax. It should be the same with MediaWiki.

That probably works for editors but not for patrollers. Ie. we can keep the editing interface as it is (there would have to be a non-JS fallback with a textfield for each slot, but it does not have to be the default, even for non-JS users), but history will need some changes (it has to expose edits which do not change the main content, and probably add some filtering tools to handle that) and the diff view will have to expose the slots. That might be worth a discussion.

Yes, at least in diffs, slots will be a visible concept. For history, watchlist, recentchanges, etc, filtering by slot may be useful, but otherwise I don't think it's necessary to expose the concept of slots there.

You are right that this aspect could use some more thought and discussion. The best place for this is the talk page of https://www.mediawiki.org/wiki/Multi-Content_Revisions/Views I think.

The risk: the more that our data formats become a complex mystery that is only understood by a handful of people, the fewer people that will trust the systems we produce.

Ah, yes, I agree. The structure of our content should be clearly defined and easy to grasp for interested people. That structure will become slightly more complex with MCR, since we add a level of indirection. On the plus side, the data formats used to represent things like categories or page assessments or license information will become a lot clearer and easier to understand and re-use.

Well, the "lot clearer" assertion remains to be seen. I think the current proposal still seems like an enormous change. I'm starting to wrap my head around it, but I can't fault many skeptics for questioning whether this represents a "minimum viable product". I realize there are many use cases, but what single use case would you consider your single must-have use case for an MVP?

Pppery removed a subscriber: Pppery.Sep 28 2016, 4:38 PM
daniel added a comment.EditedSep 28 2016, 5:16 PM

Well, the "lot clearer" assertion remains to be seen. I think the current proposal still seems like an enormous change. I'm starting to wrap my head around it, but I can't fault many skeptics for questioning whether this represents a "minimum viable product". I realize there are many use cases, but what single use case would you consider your single must-have use case for an MVP?

You are right that it is a big change, both conceptually and technically. I'm doing my best to minimize the cost, but it's not trivial.

To me it seems like the cost is justified because MCR would address the need of several use cases. For a single use case, it would perhaps not be justified, and a more specialized solution would be sufficient. But a specialized solution for each use case would be a lot more expensive, and would introduce a lot more complexity. The idea is that adding a layer of abstraction, MCR, will allow such use cases to be implemented with a minimum of extra code.

It's about scalability of the platform when adding features. Compare: TCP isn't great because it serve a specific use case particularly well, but because serves a large number of use cases reasonably well by adding a generalized abstraction layer for flow control on top of IP. Similarly, MCR aims to add a degree of freedom to MediaWiki's page model, which should serve a number of use cases quite well, in that it lower the complexity of their implementation significantly.

To allow the supposed benefit of MCR to be assessed and verified, we should define the requirements for the MVP for each must-have use case. If we find significant overlap in the platform needs of several use cases, a generalized solution like MCR is justified. The requirements for that generalized solution can then be derived directly from the platform needs of MVPs.

I have done the above informally in conversations with WMF product owners and developers over the last year, but I admit that this is not documented sufficiently. We (Lydia and me) are in the process of reaching out to WMF product owners, asking them to provide more detailed requirement, rationales, and priorities for their use cases, and we plan to document them on a subpage of https://www.mediawiki.org/wiki/Multi-Content_Revisions.

To me as a Wikidata developer, the "killer use case" is structured media info, but e.g. James, Mark, or Kaldari may have other priorities. The Wikidata team will provide a brief summary of the requirement and rationale for structured media info soon, but to get it right, we want to coordinate with the WMF multimedia team first.

(Please note that I'm out of office until October 24; I'll be working some of the time, but I will be traveling and attending a conference)

To me as a Wikidata developer, the "killer use case" is structured media info, but e.g. James, Mark, or Kaldari may have other priorities. The Wikidata team will provide a brief summary of the requirement and rationale for structured media info soon, but to get it right, we want to coordinate with the WMF multimedia team first.

I may update the description of this task and of the RFC on mediawiki.org to say this. This answer isn't etched in stone, but when someone asks me "what is the MVP for Multi-Content Revisions", I'll say "structured media info". I'm not sure which URL I'll point them to, but I'm sure I'll find something.

(Please note that I'm out of office until October 24; I'll be working some of the time, but I will be traveling and attending a conference)

Thanks for reminding us of this. You're obviously the primary contact from WMDE for this, but who is the product manager from WMDE whose work would be blocked if this is delayed? Is that @Lydia_Pintscher or someone else?

Tgr added a comment.Sep 28 2016, 7:29 PM

It might be helpful to split the use cases into ones where MCR is nice to have and those which need it. As I understand it, there are roughly three groups:

  • data that would otherwise be stored on separate pages but could be bundled into a single page for better UX: media info, doc subpages, {{/header}} and similar templates, maps JSON blobs etc. This is mostly "nice to have" territory although in the case of media info (some of which will have to be manually migrated from description page templates) the UX degradation would be pretty jarring so that might be closer to must have.
  • data that is currently stored on multiple pages but needs atomic updates to ensure consistency (gadget CSS/JS, template styles, template/module test pages). MCR is needed to make those behave correctly.
  • supplementary data that is used by some tool (editor, mobile app etc) and not really intended for direct manual editing: lead image focus, structured categories, page assessments, maps. These would have to be stored somewhere else, which would be a major loss of efficiency for developers as they would have to rebuild fundamental infrastructure from scratch for each one.

Thanks for reminding us of this. You're obviously the primary contact from WMDE for this, but who is the product manager from WMDE whose work would be blocked if this is delayed? Is that @Lydia_Pintscher or someone else?

Yes it's mine.

It might be helpful to split the use cases into ones where MCR is nice to have and those which need it. As I understand it, there are roughly three groups:

  • data that would otherwise be stored on separate pages but could be bundled into a single page for better UX: media info, doc subpages, {{/header}} and similar templates, maps JSON blobs etc. This is mostly "nice to have" territory although in the case of media info (some of which will have to be manually migrated from description page templates) the UX degradation would be pretty jarring so that might be closer to must have.

I would argue it is a must have. We can technically do it in several pages but the chance of getting it accepted by the community with the degraded usability and features is close to 0.

  • data that is currently stored on multiple pages but needs atomic updates to ensure consistency (gadget CSS/JS, template styles, template/module test pages). MCR is needed to make those behave correctly.
  • supplementary data that is used by some tool (editor, mobile app etc) and not really intended for direct manual editing: lead image focus, structured categories, page assessments, maps. These would have to be stored somewhere else, which would be a major loss of efficiency for developers as they would have to rebuild fundamental infrastructure from scratch for each one.

I may update the description of this task and of the RFC on mediawiki.org to say this. This answer isn't etched in stone, but when someone asks me "what is the MVP for Multi-Content Revisions", I'll say "structured media info". I'm not sure which URL I'll point them to, but I'm sure I'll find something.

https://commons.wikimedia.org/wiki/Commons:Structured_data is the best we have atm.

daniel added a comment.EditedSep 28 2016, 7:34 PM

It might be helpful to split the use cases into ones where MCR is nice to have and those which need it. As I understand it, there are roughly three groups:

I'm missing the group "currently embedded in wikitext and would benefit from separate storage, editing, diffing, etc", e.g. page assessment, media info, categories, template schema, translation tables, ...

Did anyone consider that it might be a bad idea to start building a radical change to the editing environment without investigating whether the editing community wants this?

Each of the use cases have had quite a bit of discussion, and has had quite a bit of investigation by the people proposing it.

So the answer is no, no thought of investigating whether the editing community wants this.

What I take away from @Alsee's comment is that we should provide a more comprehensive and detailed overview of the use cases.

So the answer is no, no thought of investigating whether the editing community wants this.

You've got two editors who stumbled across (*) this project, both waving red flags that there may be a problem here.

The WMF has been working on a Technical Collaboration Guideline as part of the Software Devlopment Process. In part, "establishing best practices for inviting community involvement in the product development and deployment cycle". Most development goes smoothly and everyone is happy with a lot of what the WMF develops, but there is a long history of occasional projects that result in conflict. There have been cases where the WMF believed something was obviously a good idea, but where editors had a very different perspective. The editing community may weigh the pros and cons very differently than you have.

The idea of pulling categories, templates, and other things of out the wikitext is a pretty radical change. I understand you have use-case-proposals and the reasons you think they're good ideas. I'm not here to directly debate that. I'm here to alert you to the fact that this is a Big Deal. I am here to alert you that the Community may have a very different perspective, that this may be highly controversial. The proposed use cases may start evaporating if the community considers them unwanted or disruptive.

I'm saying it would be a good idea to post the template-use-case and/or category-use-case and/or others at EnWiki Village Pump to find out how it will be received. (EnWiki is nearly half the global community, you can certainly post elsewhere as well if you feel broader input is needed.)

The response could range from "we love it", to identifying must-have design requirements to support various workflows, to "hell no". Whichever way it goes, the time to get that information is before something is built.

It's not true that we have not asked the community. Structured data for Commons has been asked for many many times. People are very happy with the progress we have made so far as can be seen for example here: https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#It.27s_alive.21 Or here: https://blog.wikimedia.org/2016/08/23/wikidata-glam/
For the Wikidata team Multi Content Revisions is an essential part of making structured data on Commons happen. All the other use cases are potentials at this point. Their teams will be responsible for doing the community consultations on these as they start working on them. If they go ahead on those or not is independent of our need to have it for structured data on Commons. It is however important to bring them up now to make the case for why Multi Content Revisions are important to have in the long term.

What I take away from @Alsee's comment is that we should provide a more comprehensive and detailed overview of the use cases.

So the answer is no, no thought of investigating whether the editing community wants this.

That's not what I said. To the contrary, I know that such investigation was done at least for the use case that is my job to take care of, namely structured media info; I also know that separating categories out of the wikitext has been requested and discussed numerous times. What I said is that the status of these investigations and discussions needs to be better documented and linked to from the technical proposal.

The idea of pulling categories, templates, and other things of out the wikitext is a pretty radical change. I understand you have use-case-proposals and the reasons you think they're good ideas. I'm not here to directly debate that. I'm here to alert you to the fact that this is a Big Deal. I am here to alert you that the Community may have a very different perspective, that this may be highly controversial. The proposed use cases may start evaporating if the community considers them unwanted or disruptive.

I agree that it would be a Big Deal to e.g. moved Wikipedia infoboxes out of the wikitext. But please note that this RFC does not propose doing that. It proposes a change to the platform that would allow us to do that -- and more importantly, it would allow other sites to manage infoboxes outside the wikitext.

Of course, if none of the use cases was endorsed by the community (which community?), the proposed change to the platform would be pointless. And you are correct that we need to take care to have the community in the loop when discussing use cases and requirements.

I fear I missed an important point when listing the use cases: I did not make a clear distinction between use cases for which we have consensus for implementing them and use cases for which we see potential, or have had repeated requests, but which have not yet been fully investigated or discussed broadly. That's why I said that we need a more comprehensive and detailed overview of the use cases.

PS: One side note about discussing changes to the editing interface with the community of editors: the editors who are active on the site today are the ones who like (or at least got used to) the current interface - the ones that find the current way to edit unusable have given up after a few tries. We would like to change this, and open the editing experience to people who do not want to fiddle with complex syntax; this may mean changes that some people who have become experts at fiddeling with wikitext don't like. We'll need to find the right balance, but we cannot find it if we listen only to the people who are active editors now. But that has nothing to do with the MCR proposal, it's just a general observation about discussing new features with "the" community.

Pppery added a comment.Oct 8 2016, 9:10 PM

It might be helpful to split the use cases into ones where MCR is nice to have and those which need it. As I understand it, there are roughly three groups:

I'm missing the group "currently embedded in wikitext and would benefit from separate storage, editing, diffing, etc", e.g. page assessment, media info, categories, template schema, translation tables, ...

Why? What is wrong with the page assessment being stored in wikitext? Seperate editors for TemplateData and categories already exist and I see no need to split these out further. This would probably also make it harder to do the semi-common thing of removing an {{uncategorized}} template and replacing it with categories, which would requires going through two editors in this proposal

daniel added a subscriber: Pppery.EditedOct 12 2016, 4:36 AM

Why? What is wrong with the page assessment being stored in wikitext? Seperate editors for TemplateData and categories already exist and I see no need to split these out further.

The people who write specialized editors like that are exactly the ones who want MCR most. Because it's really hard to get this right. For instance, how can you find all category links on a page? You need to know all local aliases for the category namespace, you have to know which tag extensions accept wikitext (<poem> does, but <source> doesn't), and if the category link is in a template parameter, you have no idea whether it is *actually* a category link, or just looks like one.

And if you add a category, we save (and re-render!) a new copy of the entire page, instead of just the bit that changed.

Yes, these tools exist, but they unreliable, hard to maintain, and inefficient. That's exactly how the idea for MCR was born.

Izno added a subscriber: Izno.Oct 12 2016, 1:20 PM

which would requires going through two editors in this proposal

You seem to be visualizing a particular implementation. That's usually bad design.

When I see a potential implementation (for a wikitext solution--never mind VE for right now), I see multiple <textarea>s, each with their own storage of elements. That doesn't require a new window. Or autocomplete-enabled category selection ("here, add tags for this article!") without ever exposing the wikitext syntax to the user. (Right now, I need a gadget for that.) Or forms for page assessment (regarding which, I have no doubt T120219: PageAssessments deployment to WMF wikis would be enabled by MCR).

daniel added a comment.Nov 1 2016, 5:03 PM

I have proposed T149532: Why Multi-Content-Revisions? Use cases and requirements. as a session for the #Wikimedia-Developer-Summit_(2017). If you are interested in such a discussion at the summit, please comment on the ticket.

Question : History of old articles

If I understand correctly, this feature will potentially allow to view an article with the versions of the templates that existed at the time the wikitext was edited. Two questions arise then :

  • will that also work for deleted templates ?
  • will we be able to restore the revisions of version prior to multiple content revision deployment, say a 2005 revison of some article ?

@TomT0m No, Multi-Content-Revisions does not help with consistent display of old template revisions. Well, it does in cases where the use of templates is replaced by the use of slots - if e.g. template documentation was stored in a slot instead of a subpage, you would always see the correct version of the documentation for old versions of the template. But that would be because it would no longer use the template mechanism.

@TomT0m No, Multi-Content-Revisions does not help with consistent display of old template revisions. Well, it does in cases where the use of templates is replaced by the use of slots - if e.g. template documentation was stored in a slot instead of a subpage, you would always see the correct version of the documentation for old versions of the template. But that would be because it would no longer use the template mechanism.

Ok, I got confused. Does that mean that the documentation will not have its wikipage address anymore ?

Would this then be possible to have a special type of "reference" slot which would hold a pointer to another page revision ? I guess the parser could be modified to maintain those reference slots when page are saved.

For example the parser computes a new version of the page when its content is modified, and when he expands a template a hook triggers the slot mangager to store the revision number of the template with those "reference" slots - I guess this this kind of hooks or something similar exists since we got a list of the used templates on previsualisation of a page.

Tgr added a comment.Nov 17 2016, 8:09 AM

If I understand correctly, this feature will potentially allow to view an article with the versions of the templates that existed at the time the wikitext was edited.

You might be thinking of Memento (which is not related to this in any way).

daniel added a comment.EditedNov 19 2016, 3:23 PM

Ok, I got confused. Does that mean that the documentation will not have its wikipage address anymore ?

Yes, the documentation would be part of the template page proper, and would not have a separate title.

Would this then be possible to have a special type of "reference" slot which would hold a pointer to another page revision ? I guess the parser could be modified to maintain those reference slots when page are saved.

That would theoretically possible, but there are currently no plans to do this. I'm also not sure this would be the best way to tie a page revision to template revisions. So far, slots are intended to be editable, not derived. I have been thinking about derived slots, but the use cases for that idea all seem a bit contrieved, and would perhaps be better served by a more specialized solution, like a dedicated database table.

For example the parser computes a new version of the page when its content is modified, and when he expands a template a hook triggers the slot mangager to store the revision number of the template with those "reference" slots - I guess this this kind of hooks or something similar exists since we got a list of the used templates on previsualisation of a page.

This could be done with a DB table that associated a revision ID of the "transcluder" with a revision ID of the "transcluded" in each row. Simple enough to do, and would be stable against moving the template being renamed, etc. It's going to be a big table, though. And quite a change in how things work. As Tgr pointed out, there is the Memento extension that does this with some limitation. It's a feature that has been discussed time and time again, but never gained enough traction to be properly implemented.

Just for clarity, as I've worked on this task but not actually commented, we in Editing see MCR as very important to our long-term plans. The use cases laid out at Multi-Content Revisions#Use Cases cover a lot, but I'll just pull out the four that we see as most vital:

  • The structured media info work, as almost goes without saying;
  • Rejigging templates to have dedicated template, styling, data, and documentation slots, with UI to match;
  • Rejigging files to have a fused history for the blob and the description, removing UI confusion; and
  • Moving to a structured-data approach for categories.

Lots of others are also important, but those are the most useful.

daniel moved this task from Inbox to Project on the User-Daniel board.Jan 5 2017, 7:03 PM
cscott added a comment.EditedJan 11 2017, 8:13 PM

If we use MCR for annotation storage, it would be useful to have a canonical URL for the contents of a specific slot. That might be an API URL, like https://en.wikipedia.org/api/rest_v1/page/html/Main_Page/749836961/<slot number> or else a user-visible URL like https://en.wikipedia.org/wiki/Main_Page/<slot name> or https://en.wikipedia.org/wiki/<Slot>:Main_Page or even a quasi-API URL like https://en.wikipedia.org/wiki/Special:redirect/slot/<revision>/<slotname>. Thoughts?

(cc @MarkTraceur)

Addshore removed a subscriber: Addshore.Apr 3 2017, 9:52 AM
Deskana added a subscriber: Deskana.Jun 7 2017, 2:09 PM
-jem- added a subscriber: -jem-.Jun 23 2017, 10:51 AM
Ayack added a subscriber: Ayack.Jul 6 2017, 7:05 PM
Rical added a subscriber: Rical.Jul 16 2017, 2:04 PM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptAug 1 2017, 10:28 AM
Nirmos added a subscriber: Nirmos.Oct 11 2017, 2:34 PM
daniel moved this task from Backlog to Epic on the Multi-Content-Revisions board.
Smalyshev removed a subscriber: Smalyshev.
Krinkle updated the task description. (Show Details)Jan 10 2018, 5:30 PM
Krinkle updated the task description. (Show Details)Jan 10 2018, 5:35 PM
Krinkle updated the task description. (Show Details)Jan 10 2018, 5:37 PM
Rical added a comment.Jan 10 2018, 8:20 PM

As assigned to T135845, I would use dedicated pages to exchange structured datas between several central or local Lua modules.
These pages could content:

  • the first and actual versions of several modules
  • the story of mediawiki versions in each wiki to help coders to better describe new bugs
  • the options to manage these exchanges: central or local, priorities of some modules as managers, used structures...
  • the places where find i18n translations...