Page MenuHomePhabricator

showDiff() highlighting limitation due to difflib design
Closed, InvalidPublic

Description

Originally from: http://sourceforge.net/p/pywikipediabot/bugs/509/
Reported by: cosoleto
Created on: 2007-09-28 07:35:32
Subject: showDiff() highlighting limitation due to difflib design
Assigned to: cosoleto
Original description:
showDiff\(\) can fail to highlight a char-by-char difference because Python difflib seems don't support fully char-by-char comparison.

Please see in Python tracker:

\* issue \#1528074: "difflib.SequenceMatcher.find\_longest\_match\(\) wrong result" \(http://bugs.python.org/issue1528074)

\* issue \#1678345: "A fix for the bug \#1528074 \[warning: quite slow\]" \(http://bugs.python.org/issue1678345)


Version: unspecified
Severity: normal
See Also:
https://sourceforge.net/p/pywikipediabot/bugs/509

Details

Reference
bz55329

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:34 AM
bzimport set Reference to bz55329.
bzimport added a subscriber: Unknown Object (????).

Logged In: YES
user\_id=181280
Originator: YES

File Added: difflib\_test.py

Assigned before somebody certainly steals this issue to me. I am going to add a modified difflib version. Unless the lack of feature is fixed in recent Python builds or, of course, anyone makes an objection. I am not sure about a config option to enable or disable line-by-line/char-by-char comparision.

  • priority: 6 --> 7
  • assigned_to: nobody --> cosoleto

Actually, I'd very much like to see better diff support for pywikipedia. I dont know why I missed that bug =\)

I see in those bugs several comments about complexity changes, saying that a patch could change complexity from O\(n\*m\) to O\(n+m\), which certainly looks interesting. If char-by-char comparison provides better diffs, at a lower cost, what exactly is the reason for not supporting in Python? :s

Two things to look at during implementation:
\* Would it provide interesting diffs for all cases? \(if one case is improved while other matches get worse, it's not so interesting anymore\)
\* Performance changes for big diffs.

Good luck =\)

I haven't need luck because I am not going to do big works, just silly adaptation of already written code \(with loss of performance\). If you are interested to work on this problem in a different way you are welcome \(and not only in this open project\). Anyway it's nice to see you have analysed the situation a bit.

The changed version should be safe, without regression cases. I will see to document performace loss.

This appears to have been fixed upstream, right?

Both links in comment 0 (http://bugs.python.org) have been fixed, indeed.

Framawiki subscribed.

Should be solved in five years :)

Not sure what the OP meant, but one is fixed and one is rejected...