Page MenuHomePhabricator

Bug tokenizing commented <ref>
Closed, ResolvedPublic

Description

The commented <ref> seems to throw off the tokenizer -- it is probably the regexp that extracts the ref content there or the ordering of productions (comments and extension tags).

A<ref>B <!--<ref name="x" />--></ref>
C<ref>D</ref>

Reduced test case from output seen in http://parsoid-lb.eqiad.wikimedia.org/enwiki/Axial_Seamount?oldid=657127754
See report here: https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&oldid=657286828#Article_swallowed_as_a_note.

Event Timeline

ssastry created this task.Apr 20 2015, 2:03 PM
ssastry updated the task description. (Show Details)
ssastry raised the priority of this task from to Normal.
ssastry added a subscriber: ssastry.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 20 2015, 2:03 PM
ssastry updated the task description. (Show Details)Apr 20 2015, 2:04 PM
ssastry set Security to None.

I did a quick test:

-                    while (s && s.match(new RegExp("<" + tagName + "[^<>]*>"))) {
+                    while (s && s.match(new RegExp("<" + tagName + "[^/<>]*>"))) {

That change fixes this specific test case. However, there is a larger issue here which is that comment parsing has lower precedence than extension content parsing => there will be several other test cases where commented out opening/closing <ref> tags (or any extension tag, really) will parse differently in Parsoid when compared to the PHP parser (where comments are stripped out of the text before additional processing).

Change 282394 had a related patch set uploaded (by Arlolra):
T96555: Remove <ref> hack from the tokenizer

https://gerrit.wikimedia.org/r/282394

Change 282394 had a related patch set uploaded (by Arlolra):
WIP: Remove <ref> hack from the tokenizer

https://gerrit.wikimedia.org/r/282394

Arlolra claimed this task.Apr 8 2016, 6:35 PM

Change 326890 had a related patch set uploaded (by Arlolra):
T96555: Ignore self-closed tags when extending source

https://gerrit.wikimedia.org/r/326890

Change 326890 merged by jenkins-bot:
T96555: Ignore self-closed tags when extending source

https://gerrit.wikimedia.org/r/326890

Arlolra closed this task as Resolved.Dec 14 2016, 1:32 AM

Mentioned in SAL (#wikimedia-operations) [2016-12-15T18:24:05Z] <arlolra> Updated Parsoid to 6719e240 (T96555)