noinclude tag breaks Proofread under Internet Explorer
Closed, ResolvedPublic

Description

Author: aleator_wiki

Description:
On any Wikisource, under Internet Explorer 8.0 (I cannot test other versions), if we add a <noinclude>foo</noinclude> tag in the "Page body" section of a Proofread system's page, the noincluded text moves to the "Footer" section and deletes the end of the text.

E.g. (http://ca.wikisource.org/wiki/Pàgina:Buscant_lo_desconegut_(1898).djvu/2) AAA<noinclude>BBB</noinclude>CCC leves AAA in "Page body", moves "BBB" to "Footer", and CCC desappears.

Under Mozilla Firefox 3.6.3 it works fine.

I think this breaks Catalan Wikisource statistics from the Toolserver (http://toolserver.org/~thomasv/statistics.php?diff=3) since the modification of the page http://ca.wikisource.org/wiki/Pàgina:Buscant_lo_desconegut_(1898).djvu/29.

Thanks.


Version: unspecified
Severity: normal
URL: http://ca.wikisource.org/wiki/P%C3%A0gina:Buscant_lo_desconegut_%281898%29.djvu/2

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz26881.
bzimport created this task.Via LegacyJan 23 2011, 12:11 PM
bzimport added a comment.Via ConduitJan 24 2011, 1:08 AM

aleator_wiki wrote:

Hint: it could be related to $page_regexp from parse_page function at ProofreadPage_body.php. I can interpret it as if it divides the $text into three parts depending on the noinclude tags (I suppose for detecting header, body and footer). But cannot explain the diference between MF vs IE.

Bawolff added a comment.Via ConduitJan 24 2011, 1:59 AM

Can not reproduce on IE6 (Admittedly that's the wrong version of IE, but its what i had available at the moment).

I don't see the BBB's being differentiated in any way in the html source, so i have no idea what could cause this.

bzimport added a comment.Via ConduitJan 28 2011, 1:57 AM

aleator_wiki wrote:

Mmm... It's not ProofreadPage_body.php, because html parses identical in both browsers. I've found the responsible code at http://ca.wikisource.org/w/extensions/ProofreadPage/proofread.js?26

The behaviour of the regexp is different in the 2 browsers:

  • With MF, regexp matches so "m" exists and splits the text correctly in the 3 desired parts.
  • With my IE 8.0, "m" don't exist (no matching, why? [comment on code shows the author realized not all browsers worked]). So it tries to split text in 2-3 times. First into m2[1] (header, splits but I see 3 extra line breaks, those between the end of "<div class="pagetext"> and the beginnig of the next "</noinclude>") and m2[2] (body+footer). And then m[2] into m3[1] (body, truncated) and m3[2] (footer, part of what should be the body, being lost the end part). It makes sense studying the code.

I'm trying to get the correct combination of the regexp (e.g. http://ca.wikisource.org/w/index.php?title=Usuari:Aleator/vector.js&oldid=29992, testing with the sandbox http://ca.wikisource.org/w/index.php?title=Viquitexts:P%C3%A0gina_de_proves&oldid=29824), but I still don't have the solution. I'll keep trying.

bzimport added a comment.Via ConduitJan 28 2011, 1:58 AM

aleator_wiki wrote:

The function is "pr_make_edit_area".

Bawolff added a comment.Via ConduitJan 28 2011, 5:29 PM

Said JS is only executed when editing the page, before i thought you meant it happened when you were viewing not editing the page.

there is a comment there saying "apparently lookahead is not supported by all browsers so let us do another regexp" However said regex does not use look-ahead assertions, so I'm not sure what that's about. But anyways, I'm not familiar enough with the code to know what I'm talking about here.

Anyways, tested http://ca.wikisource.org/w/index.php?title=P%C3%A0gina:Buscant_lo_desconegut_(1898).djvu/2&action=edit in IE6, and confirmed bug is present. Re-opening bug.

MarkAHershberger added a comment.Via ConduitJan 29 2011, 7:11 PM

I think this may be there in FF 3.6.13 on Linux. When I first visit the page I see:

AAABBBCCC BBBBBBBB

Clicking "Modifica" shows the edit box containing:

AAA<noinclude>BBB</noinclude>CCC

Clicking the [+] box displays the following in "Peu de pàgina (noinclude)":

BBBBBBBB
bzimport added a comment.Via ConduitJan 29 2011, 7:49 PM

aleator_wiki wrote:

(In reply to comment #7)

I think this may be there in FF 3.6.13 on Linux. When I first visit the page I
see:

AAABBBCCC BBBBBBBB

Clicking "Modifica" shows the edit box containing:

AAA<noinclude>BBB</noinclude>CCC

Clicking the [+] box displays the following in "Peu de pàgina (noinclude)":

BBBBBBBB

That's the expected and correct behaviour (FF works 100% fine).

The same edition in IE 8.0 leaves AAA in the body ("Cos de la pàgina"), BBB has jumped to the footer ("Peu de pàgina"), and... there's no CCC and BBBBBBBB (if we save that edit info will be lost).

The key is pr_make_edit_area.

Billinghurst added a comment.Via ConduitJun 22 2011, 10:03 AM

@mark A. Hershberger the behaviour that you report will be due to your not opening the header/footer combination, and if you have NO toolbar, or the Wikieditor, you cannot expand those sections. One needs to regress to the older toolbar. [Matter discussed elsewhere in bugzilla.

Phe added a comment.Via ConduitJul 29 2011, 5:14 PM

hi, I've no working IE to test, but I can reproduce it with Opera 10.

re3 = /^([\s\S]*?)<noinclude>([\s\S]*?)<\/noinclude>/;

here there is two competing non-greedy group, I think FF is right by making the first one maximal, it's the same case as greedy, if there is choice for multiple ways to do a match, the match at the same level of greediness must be maximal from left to right. I guess this is why the comment talk about lookahead, because with non-greedy competition, match must be moved from right to left to implement that. The trouble is that Thomas didn't think the same things can occur with re3 if the page contains more than two <noinclude></noinclude> sequence. More boring re3 miss a final $ so only a part of the remaining part is matched and some data are lost.

This works on Opera, by making both group greedy:

re3 = /^([\s\S]*)<noinclude>([\s\S]*)<\/noinclude>\s*$/;

I included too a \s*$ at the end to ensure if we match, we match the whole data, this way if for some reason the match fail, we will go to the if (m3) {.. } else { pageBody = m2[2]; pageFooter = ''; } which ensure than no data can be lost (re2 is terminated by a $). Patch attached, test with IE needed.

Phe added a comment.Via ConduitJul 29 2011, 5:17 PM

Created attachment 8840
Proofread.js: fix for data lost when more than two <noinclude></noinclude> sequence exists

Attached: proofread-js-re3-non-greedy.patch

MarkAHershberger added a comment.Via ConduitSep 29 2011, 5:52 PM

r98422

bzimport added a comment.Via ConduitOct 1 2011, 1:50 AM

aleator_wiki wrote:

Not fixed yet :S
Go to http://ca.wikisource.org/wiki/Pàgina:Buscant_lo_desconegut_(1898).djvu/2 with Internet Explorer 8.
It contains "AAA<noinclude>BBB</noinclude>CCC" in the edit box.
Click "edit" (or modify) but do not make any change.
Click "show changes" for watching how the changes have run: I can see an extra "<div class="pagetext">" in the noinclude header, "AAA" in the edit box (without anything else), "BBB" in the noinclude footer, and "CCC" has disapeared.
Firefox 5.0 runs 0 changes.

Bawolff added a comment.Via ConduitOct 1 2011, 12:50 PM

(In reply to comment #13)

Not fixed yet :S
Go to http://ca.wikisource.org/wiki/Pàgina:Buscant_lo_desconegut_(1898).djvu/2
with Internet Explorer 8.
It contains "AAA<noinclude>BBB</noinclude>CCC" in the edit box.
Click "edit" (or modify) but do not make any change.
Click "show changes" for watching how the changes have run: I can see an extra
"<div class="pagetext">" in the noinclude header, "AAA" in the edit box
(without anything else), "BBB" in the noinclude footer, and "CCC" has
disapeared.
Firefox 5.0 runs 0 changes.

I don't think the fix was deployed yet...

Add Comment