Page MenuHomePhabricator

Diffs page should show diffs and content from multiple slots [MCR]
Closed, ResolvedPublic

Description

When showing a diff between revisions of a page that has multiple streams (multiple slots per revision), a diff for each slot needs to be show.

See https://www.mediawiki.org/wiki/Multi-Content_Revisions/Views#Diffs

NOTE: This task requires not only the diffs to be shown for all slots (T194731), but also the content view shown below the diffs to show the combined of all slots (T174035).

Event Timeline

DifferenceEngine is a god class which does everything from interpreting query parameters for selecting the target revision to direct DB queries to table formatting to writing to OutputPage. Also extensions can replace it with a subclass. So a proper solution will involve a full rewrite which splits DifferenceEngine into a controller that determines which slots to diff, the actual diff generation code, and a layout generator which does headings and whatnot. Also probably some kind of composition logic that might vary the same way the composition logic for content rendering does. And BC shims since at least the diff generation code (but probably also some of the layout) lives in the extensions. That's probably way too much for phase 1.

A hacky solution that just makes a few DifferenceEngine methods iterate through multiple contents does not seem too hard; some questions that came up in the process:

  • The DifferenceEngine is selected by the ContentHandler, even though the old and new content is not guaranteed to be the same content type. (As a quirk of that, oldid=1&diff=next and oldid=2&diff=prev might in theory end up being completely different renderings.) There are multiple ways of handling that with slots:
    • Make diff engine selection the responsibility of the slot handler instead. IMO the sanest approach; could get messy if multiple extensions share a slot but that seems like an unusual thing to do.
    • Force non-main slots to always have the same content model. There are use cases that would be prevented by that (e.g. wikitext vs. Markdown for the documentation slot).
    • Keep the current behavior (maybe make it less chaotic by always using the old revision to select the engine).
  • Currently when diffing the first revision to its predecessor, no diff is shown, just an error message. When looking at a diff adding a new slot, that seems unhelpful. So we might need new DifferenceEngine functionality for generating a "null diff".
  • Currently the DifferenceEngine can personalize the diff header (tools like undo or link to previous). How will that look when there is a separate engine for each changed slot?

Trying to have DifferenceEngine iterate over Content objects doesn't seem too workable. Better, IMO, for most cases would be to have a difference engine per slot generate that slot's diff and then combine them elsewhere.

It would involve creating new DifferenceEngine instances for each non-main slot, sure. But given that DifferenceEngine does everything including revision lookup, and pretty much all its method are public (and some set public properties as a side effect) having Article handle multiple DifferenceEngine instances would probably break more things.

...well, I guess it depends on BC expectations. Should DifferenceEngine::getDiffBody return the concatenated diff of all slots, or just the main slot and then another DifferenceEngine method would create a diff for a specified slot, with Article calling that for all other slots? Ie. do we want to include all slots wherever diffs are shown, or just for action=view?

I'd expect the "DifferenceEngine" returned by ContentHandler to diff two Content objects, rather than all Content objects that are part of a revision.

Chances are that most of the database lookup logic, header formatting, and so on would need to be extracted to a different class, so the object returned by ContentHandler can handle only the actual diff logic.

I'd expect the "DifferenceEngine" returned by ContentHandler to diff two Content objects, rather than all Content objects that are part of a revision.

Indeed. That wouldn't work anyway, since a different DifferenceEngine may be needed depending on the model.

However, as Tgr noted, DifferenceEngine currently also does a lot of other things, like looking up revisions and generating the navigation head. This would need to be refactored and split up. Or we need a MultiSlotDifferenceEngine that uses multiple DifferenceEngines internally. That's not the nicest design, but would cause the least fallout to calling code. While far from ideal, this is the approach I'd try first.

Content of different type cannot generally be diffed. This only works across subclasses of TextContent. Wikibase's EntityContentDiffView will show an error when applied to anything but EntityContent:

$this->getOutput()->showErrorPage( 'errorpagetitle', 'wikibase-non-entity-diff' );

Although a "MultiSlotDifferenceEngine" wouldn't make sense of parts of DifferenceEngine's interface like setContent(). Nor would it make much sense to be returning one of those from a ContentHandler.

Although a "MultiSlotDifferenceEngine" wouldn't make sense of parts of DifferenceEngine's interface like setContent(). Nor would it make much sense to be returning one of those from a ContentHandler.

Yes, it's hack. setContent() would probably just fail for a MultiSlotDifferenceEngine. And no ContentHandler should return such a thing.

A full refactoring would of course be much nicer. The question is - what's the "minimal viable step" into the right direction that unblocks us.

Chances are that most of the database lookup logic, header formatting, and so on would need to be extracted to a different class, so the object returned by ContentHandler can handle only the actual diff logic.

Sure, I just don't want that to be a phase 1 blocker. Filed as T194830: Refactor DifferenceEngine.

Content of different type cannot generally be diffed. This only works across subclasses of TextContent.

It could work for whatever wants it to work (such as different content types for the same role provided by the same extension). I can't think of any use case for multiple non-text content types for the same slot, but I wouldn't dismiss the possibility either.

I can't think of any use case for multiple non-text content types for the same slot, but I wouldn't dismiss the possibility either.

I think we don't have to support this if it complicates things. If it's easy and natural to support it, I don't see a problem with doing so.

daniel renamed this task from Allow the diffs to show multiple slots [MCR] to Diffs page should show diffs and content from multiple slots [MCR].May 31 2018, 5:14 PM

Change 452708 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Use RevisionRenderer for rendering ParserOutput

https://gerrit.wikimedia.org/r/452708

Moving this to the "eipc" column, since this is purely a tracking task. There is nothing to do here.

Ah, I see that it is tagged with T174035 as well, which is the 'working' task.

Tagging for 1.32 release, since this is needed to make MCR feature-complete.

I don't think "make MCR feature-complete" can happen in time for 1.32, since that would require thing like T174033: Refactor EditPage to allow multiple slots to be edited atomically [MCR] as well.

The current goal seems to be to make it complete enough for the purposes of SDC.

I don't think "make MCR feature-complete" can happen in time for 1.32

Yea, i meant "able to properly function", not "used in all places where we want to use it in core".

Having MCR without the ability of atomic edits is still ok. Having MCR without the ability to see modified content in a diff view is probably bad.

I guess this is about the MVP state of MCR. "Feature complete" was not the best phasing to use here.

Change 452708 merged by jenkins-bot:
[mediawiki/core@master] Use RevisionRenderer for rendering ParserOutput

https://gerrit.wikimedia.org/r/452708