Page MenuHomePhabricator

Line matching in diffs favours similar line length over words similarity
Open, MediumPublicFeature

Description

URL diff

See URL: the line «The Committee has the authority to engage independent [...]» is compared to «The Audit Committee will engage in an annual self-assessment [...]», rather than its previous version, apparently because the length of the line is more similar.
Bug 13462 is about paragraphs and whitespace, but whitespace is not a problem in this diff, so I think it can be considered a separate issue although maybe the solution will be the same.


Version: master
Severity: enhancement
URL: https://wikimediafoundation.org/wiki/Special:ComparePages?page1=&rev1=85009&page2=&rev2=84898&action=&diffonly=&unhide=

Attached:

diff.png (768×1 px, 77 KB)

Details

Reference
bz42053

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:01 AM
bzimport added a project: wikidiff2.
bzimport set Reference to bz42053.
bzimport added a subscriber: Unknown Object (MLST).

This is not the only aspect of line matching failures, so it's useful to spell it out clearly while we keep the generic/goal report open.

Even weirder example: in https://meta.wikimedia.org/w/index.php?title=Wikimedia_Foundation_Annual_Plan%2F2016-2017%2Frevised&type=revision&diff=15654337&oldid=15494070 the paragraph mentioning "capabilities around user research" is not matched, although the preceding paragraph is matched correctly and there are no changes in whitespace around the paragraph (the only whitespace change is the addition of a newline between "<br><br>" and "{{Back to top" below).

wikidiff-section-mismatch.png (396×893 px, 44 KB)

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 12:24 PM
Aklapper removed a subscriber: wikibugs-l-list.