Page MenuHomePhabricator

Include section name in the diff
Open, Needs TriagePublic

Description

When reading the revision diff, mostly it is hard to understand, which part of the page the diff line belongs to, because line number does not help so much - there is no line numbering in view mode.

This was voted #9 in the 2019 Community Wishlist Survey.

Project page: https://meta.wikimedia.org/wiki/Community_Tech/Section_Name_in_Diff

Event Timeline

Change 532841 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[mediawiki/core@master] WIP POC: display section names in diffs

https://gerrit.wikimedia.org/r/532841

MaxSem added a subscriber: ifried.

Above is my exploratory attempt that allowed me to make some conclusions.

  • To get a section name with any non-garbage precision, you need to get it from parser output.
  • During diff generation, parser output for the new (right) revision is already available, but not for the old (left) revision.
  • Therefore, adding section names on the right side wouldn't cause performance problems (some processing is still needed to correlate lines with sections, but it's pretty insignificant).
  • Left side, however, is different. Getting another parser output can result in cache misses and slow parse as a result. Memory consumption would also be higher.
  • The process of adding section names to diff HTML itself needs tweaks. Because section names could be dependent on user language (sigh, wikitext) and because doing that during diff generation process itself would be fairly hard, my code hooks into post-processing that localized the "Line X" headings. DifferenceEngine replaces comments like <!--LINE 123--> with a localizable message, my code also adds a section name if it's able to determine it. However, the code needs a way to tell which side the heading belongs to. My proposal is to change the comments to something like <!--LINE 123L--> and <!--LINE 123R-->. That includes both PHP TableDiffFormatter and C++ wikidiff2 extension.
  • To ensure that after the above changes an older MediaWiki version would work with newer wikidiff2, older MW releases would need to be tweaked to accept the new format. Adding new wikidiff2 function parameters or php.ini configuration would be too messy for a fairly niche use case, IMHO.

Based on these technical considerations, our next steps could be:

  • Deciding whether we really need section names on the left side.
  • Coming up with a design.
  • Implementing the crap out of it.

But this is up to @ifried.

My rough estimation of the technical side:

  • Changing HTML comments format: S
  • Finishing the above patch: L
  • Backporting new wikidiff2 support: S
  • Therefore, adding section names on the right side wouldn't cause performance problems (some processing is still needed to correlate lines with sections, but it's pretty insignificant).
  • Left side, however, is different. Getting another parser output can result in cache misses and slow parse as a result. Memory consumption would also be higher.

Just a thought from non-developer. If we know that left side and right side are equal, otherwise the left side would already be a part of the diff, can't you just copy the right side string to the left, instead of computing it by brute force?