Page MenuHomePhabricator

Parsoid incorrectly serializes (html2wt) templates with named parameters containing equals signs (=) and angle brackets (<>)
Open, MediumPublic

Description

$ echo '<p><span typeof="mw:Transclusion" data-mw="{&quot;parts&quot;:[{&quot;template&quot;:{&quot;target&quot;:{&quot;href&quot;:&quot;Template:Echo&quot;,&quot;wt&quot;:&quot;Echo&quot;},&quot;params&quot;:{&quot;1&quot;:{&quot;wt&quot;:&quot;<table class=foo>&quot;}}}}]}"></span></p>' | bin/parse.js --html2wt --prefix mediawikiwiki

{{Echo|<table class=foo>}}

This is wrong: the output should be {{Echo|1=<table class=foo>}}. Round-tripping this back to HTML breaks, because the parameter name ends up being <table class with value foo.

However, if you don't put angle brackets in, the bug doesn't happen:

$ echo '<p><span typeof="mw:Transclusion" data-mw="{&quot;parts&quot;:[{&quot;template&quot;:{&quot;target&quot;:{&quot;href&quot;:&quot;Template:Echo&quot;,&quot;wt&quot;:&quot;Echo&quot;},&quot;params&quot;:{&quot;1&quot;:{&quot;wt&quot;:&quot;table class=foo&quot;}}}}]}"></span></p>' | bin/parse.js --html2wt --prefix mediawikiwiki

{{Echo|1=table class=foo}}

Removing either one of the angle brackets (opening or closing) is enough.

From a user perspective, this means that if I create a template in VisualEditor and set the value of a numbered parameter to something that contains angle brackets and equals signs, it previews correctly but saves incorrectly.

Event Timeline

ssastry triaged this task as Medium priority.Dec 22 2017, 6:53 PM
ssastry moved this task from Needs Triage to html2wt on the Parsoid board.

We tokenize the arg string independently ( call to tokenize, the call to the peg tokenizer ) without transclusion context. So, <table class=foo> tokenizes as a HTML token normally, but the "=" takes precedence in a transclusion context and the html-tag is parsed as a "<table class" = "foo>" arg=value pair.

Not sure if this is a real bug or undefined behavior. I think it is reasonable to fix this html2wt "bug" in Parsoid, but it is also reasonable to fix the wt2html parsing (both in PHP parser and Parsoid) to treat HTML tags atomically even in transclusion context.