Page MenuHomePhabricator

MW core diff review/reorg/refactor
Closed, ResolvedPublic

Description

Summary

  • Rename DifferenceEngine to DiffPage and move DiffEngine to includes/libs.
  • Add classes wrapping wikidiff2 and the PHP diff engine.
  • Introduce a format concept as a generalization of the inline format switch.

Problems

  • DifferenceEngineSlotDiffRenderer is hackish and was soft-deprecated since its initial commit, but is used by 5 extensions.
  • TextSlotDiffRenderer::getTextDiffInternal() if/elseif block should be replaced by an engine class hierarchy, per existing TODO.
  • Various things are calling DifferenceEngine::getEngine() and phpversion( 'wikidiff2' ) in order to decide whether to do wikidiff2-specific options and formatting. It seems like a layer boundary violation. There should be an engine layer which can answer the relevant questions.
  • We need T336713 -- a user preference for diff format.
  • When the visual (client-side) diff type is selected in preferences, the two-column diff is generated and delivered anyway, only to be replaced by JS after load.
  • Most classes are not namespaced. There are classes from various layers in includes/diff.
  • Recent work in wikidiff2 allows multiple formats (say inline and two-column) to be generated simultaneously (in a batch), but there is no support for this in MW.
  • Too many things are called "engine". There are at least four distinct existing engine concepts: DifferenceEngine, DiffEngine, DifferenceEngine::getEngine() and TextSlotDiffRenderer::setEngine().
  • The requirement for all diff renderers to return "one or more <tr> tags" is awkward, necessitating <td colspan="4"> around non-table formats.

Reorg

  • DifferenceEngine can be renamed to MediaWiki\Diff\DiffPage. Everyone knows it is a page controller/view.
  • SlotDiffRenderer and its subclasses can stay in includes/diff but should be moved to the MediaWiki\Diff namespace. Conceptually, they are components of the view.
  • DiffEngine is a pure PHP library and can be moved to includes/libs along with its helper classes ComplexityException, Diff, DiffOp*, WordLevelDiff and WordAccumulator. It depends only on wfDebug().
  • DiffFormatter and its subclasses can also go to includes/libs, with the slight complication that TableDiffFormatter uses the Html class.
  • RangeDifference exists by accident and can be deleted.

DifferenceEngine subclass review

  • NewsletterDiffEngine and MassMessageListDiffEngine have multipart content. They need to render multiple text diffs and join the outputs, adding headers.
  • FlowBoardContentDiffView is just trying to turn off diffing of Flow content types.
  • JCJsonDifferenceEngine preprocesses the content and then does a regular text diff on the preprocessed content.
  • Wikibase's EntityContentDiffView is the most complex:
    • There is a page controller-level override of the language, resulting in a few method overrides.
    • getRevisionHeader apparently duplicates a lot of core code, changing some tool links.
    • getParserOutput is overridden
    • The deprecated generateContentDiffBody() is overridden in order to use Wikibase's EntityDiffVisualizer.

Engine and format concepts

Currently we have DifferenceEngine::getEngine(), which returns "php", "wikidiff2" or the path of an executable file. I would have a class hierarchy corresponding to this idea of an engine. There would be four engines: php, wikidiff2, external and VisualEditor.

We also have TextSlotDiffRenderer::setEngine() which can take e.g. wikidiff2 or wikidiff2inline as a parameter. This corresponds to my idea of a format. An engine can deliver multiple formats.

I don't see any good reason for requiring that there only be one global engine. If we allow both wikidiff2 and VisualEditor engines to exist, the VisualEditor engine can deliver a stub format for client-side replacement, if user preferences request that.

The engine factory would be responsible for collecting the available formats and deciding which should be exposed.

In ApiComparePages, the difftype parameter would always be available, and would allow the client to specify the format. The list of possible values would come from the engine factory.

Format here means text diff format. A non-text content type handler can make its own arrangements. Multipart content type handlers can use the preferred text diff format when they request text diffs. If a page has no text content, we should suppress the display of format selectors in the UI.

The Action API equivalent of suppression of UI format selectors could be a revision property, indicating whether text is present in the revision. Ignoring the difftype parameter to ApiComparePages when there is no text seems harmless. ApiComparePages can pass down the text diff format preference even when it is not used.

Engines have the following responsibilities:

  • Render diff batch. Produce array of HTML strings keyed by format.
  • Provide the expected HTML context for a given format, whether table or non-table.
  • OutputPage modification, adding required ResourceLoader modules.
  • Cache keys, invalidation.
  • Localisation, i.e. DifferenceEngine::localiseDiff()
  • Table prefixes
  • Get supported formats
  • Get preferred format batch, given input format

These engines are tied to the diff page view, they are not independent libraries like wikidiff2 and the PHP DiffEngine.

Related Objects

StatusSubtypeAssignedTask
Resolvedtstarling
OpenNone
ResolvedNone
Resolvedovasileva
ResolvedJdlrobson
ResolvedAmmarpad
OpenNone
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
ResolvedJdlrobson
Resolvedovasileva
Resolved marcella
Resolvedmatmarex
ResolvedBUG REPORTEsanders
Resolvedovasileva
Resolvedovasileva
ResolvedJdlrobson
InvalidNone
Resolved jkroll
OpenFeatureNone
OpenNone
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
DeclinedNone
ResolvedNone
ResolvedBUG REPORTSBisson
ResolvedJdlrobson
ResolvedBUG REPORTJdlrobson
ResolvedJSengupta-WMF
OpenBUG REPORTNone
OpenBUG REPORTNone
ResolvedJdlrobson

Event Timeline

Presumably the reason everything that produces differences must be called an engine is a desire to pay homage to Charles Babbage. For concision, we could cut back the term to its stem — I think "differencer" is minimal since "differ" is normally a verb. A neologism, perhaps — my browser marks it with the squiggly red underline of creativity — but it is familiar in its roots and construction.

Change 930946 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Split createTextSlotDiffRenderer() out of getSlotDiffRendererWithOptions

https://gerrit.wikimedia.org/r/930946

	/**
...
	 * @deprecated since 1.32, use setRevisions or ContentHandler::getSlotDiffRenderer.
	 */
	public function setContent( Content $oldContent, Content $newContent ) {

I want to undeprecate this. There are many callers. Most callers do not have revisions, but SlotDiffRenderer lacks caching and page assembly. For example, using SlotDiffRenderer directly, you would miss out on addHeader() -- are callers supposed to duplicate it? The complexity impact of setContent() can be reduced by factoring out access to the source data.

Although, showDiffPage() is long and very revision-specific, as are its helpers renderNewRevision(), getMultiNotice(), getRevisionHeader(). Maybe 700 lines of heavily revision-specific UI code. So just pulling out a DiffSource class hierarchy does not fix the bulk of it.

Change 930946 merged by jenkins-bot:

[mediawiki/core@master] Split createTextSlotDiffRenderer() out of getSlotDiffRendererWithOptions

https://gerrit.wikimedia.org/r/930946

Change 931329 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/Newsletter@master] Override SlotDiffRenderer instead of DifferenceEngine

https://gerrit.wikimedia.org/r/931329

Change 931330 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/JsonConfig@master] Override SlotDiffRenderer instead of DifferenceEngine

https://gerrit.wikimedia.org/r/931330

Change 931331 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Move DiffEngine and helpers to includes/libs/Diff and put them in a namespace

https://gerrit.wikimedia.org/r/931331

Change 931332 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Diff libraryization followups

https://gerrit.wikimedia.org/r/931332

Change 931331 merged by jenkins-bot:

[mediawiki/core@master] Move DiffEngine and helpers to includes/libs/Diff and put them in a namespace

https://gerrit.wikimedia.org/r/931331

Change 931332 merged by jenkins-bot:

[mediawiki/core@master] Diff libraryization followups

https://gerrit.wikimedia.org/r/931332

Change 933127 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/core@master] Fix backwards compatibility alias for WordAccumulator

https://gerrit.wikimedia.org/r/933127

Change 933128 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/TwoColConflict@master] Update namespaces of Wikimedia\Diff classes

https://gerrit.wikimedia.org/r/933128

Change 933127 merged by jenkins-bot:

[mediawiki/core@master] Fix backwards compatibility alias for WordAccumulator

https://gerrit.wikimedia.org/r/933127

Change 933128 merged by jenkins-bot:

[mediawiki/extensions/TwoColConflict@master] Update namespaces of Wikimedia\Diff classes

https://gerrit.wikimedia.org/r/933128

Change 934028 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/Wikibase@master] Use the new namespace for WordLevelDiff

https://gerrit.wikimedia.org/r/934028

Change 934029 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/AbuseFilter@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934029

Change 934030 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/ContentTranslation@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934030

Change 934031 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/Echo@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934031

Change 934033 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/Flow@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934033

Change 934034 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/MathSearch@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934034

Change 934033 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934033

Change 934028 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Use the new namespace for WordLevelDiff

https://gerrit.wikimedia.org/r/934028

Change 934034 merged by jenkins-bot:

[mediawiki/extensions/MathSearch@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934034

Change 934030 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934030

Change 934029 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934029

Change 934031 merged by jenkins-bot:

[mediawiki/extensions/Echo@master] Use the new Wikimedia\Diff namespace

https://gerrit.wikimedia.org/r/934031

Change 935184 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/Wikibase@master] Remove dependency on mDiffLang

https://gerrit.wikimedia.org/r/935184

For concision, we could cut back the term to its stem — I think "differencer" is minimal since "differ" is normally a verb.

I decided that I don't care if it's a verb, and my working copy now has TextDiffer, which is shorter than Differencer and reminds us that the hierarchy is only for diffing text.

Change 935184 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Remove dependency on mDiffLang

https://gerrit.wikimedia.org/r/935184

Change 936132 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Factor out TextDiffer hierarchy from TextSlotDiffRenderer

https://gerrit.wikimedia.org/r/936132

Change 936133 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Migrate ApiComparePages to TextDiffer concepts

https://gerrit.wikimedia.org/r/936133

Change 936393 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/ProofreadPage@master] Don't directly call the TextSlotDiffRenderer constructor

https://gerrit.wikimedia.org/r/936393

Change 936503 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/EntitySchema@master] Avoid direct construction of TextSlotDiffRenderer

https://gerrit.wikimedia.org/r/936503

Change 936393 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] Don't directly call the TextSlotDiffRenderer constructor

https://gerrit.wikimedia.org/r/936393

Change 936132 merged by jenkins-bot:

[mediawiki/core@master] Factor out TextDiffer hierarchy from TextSlotDiffRenderer

https://gerrit.wikimedia.org/r/936132

Change 936133 merged by jenkins-bot:

[mediawiki/core@master] Migrate ApiComparePages to TextDiffer concepts

https://gerrit.wikimedia.org/r/936133

Change 936503 merged by jenkins-bot:

[mediawiki/extensions/EntitySchema@master] Avoid direct construction of TextSlotDiffRenderer

https://gerrit.wikimedia.org/r/936503