Page MenuHomePhabricator

Parsoid renders template as comment, breaking serialization
Closed, ResolvedPublic

Description

I simply deleted a word from the leading paragraph in the article below, and got this as diff:
https://he.wikipedia.org/w/index.php?title=%D7%9E%D7%A9%D7%AA%D7%9E%D7%A9%3ANurick%2F%D7%91%D7%93%D7%99%D7%A7%D7%95%D7%AA_-_%D7%A2%D7%95%D7%A8%D7%9A_%D7%97%D7%96%D7%95%D7%AA%D7%99&diff=14670508&oldid=14667916

To reproduce: Revert to the previous version in that diff, edit the article in VE, and delete a word from the leading paragraph.


Version: unspecified
Severity: normal

Details

Reference
bz54927

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:37 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz54927.

This is due to a Parsoid bug.

Example that Moriel gave me: http://parsoid.wmflabs.org/he/%D7%93%D7%A0%D7%99%D7%90%D7%9C_%D7%A8%D7%93%D7%A7%D7%9C%D7%99%D7%A3

HTML starts with:

<html><head>...</head><body>
<!--{"@type":"mw:Transclusion","attrs":[{"name":"typeof","value":"mw:Transclusion"},{"name":"about","value":"#mwt1"},{"name":"id","value":"mwt1"},{"name":"data-mw-arginfo","value":"{\"dict\":{\"target\":{\"wt\":\"שחקנים
\\n\",\"href\":\"./תבנית:שחקנים\"},\"params\":{\"שם\":{\"wt\":\"דניאל רדקליף\"},\"תמונה\":{\"wt\":\"[[קובץ:Daniel Radcliffe Paris 2012.jpg|200px]]\"},\"כיתוב\":{\"wt\":\"<small> דניאל רדקליף, [[2012]]</small>\"},\"תאריך לידה\":{\"wt\":\"23 ביולי 1989\"},\"מקום לידה\":{\"wt\":\"[[לונדון]], [[אנגליה]], [[הממלכה המאוחדת]]\"},\"כינוי\":{\"wt\":\"דן\"},\"דמות\":{\"wt\":\"[[הארי פוטר (דמות)|הארי פוטר]]\"},\"בכורה\":{\"wt\":\"בסדרה [[דייוויד קופרפילד]], 1999\"},\"אתר אינטרנט\":{\"wt\":\"http://www.danradcliffe.co.uk/\"},\"קישור\":{\"wt\":\"0705356\"}}},\"paramInfos\":[{\"k\":\"שם\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"
\\n\"]},{\"k\":\"תמונה\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]},{\"k\":\"כיתוב\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]},{\"k\":\"תאריך לידה\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]},{\"k\":\"מקום לידה\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]},{\"k\":\"כינוי\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]},{\"k\":\"דמות\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]},{\"k\":\"בכורה\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]},{\"k\":\"אתר אינטרנט\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]},{\"k\":\"קישור\",\"named\":true,\"spc\":[\"\",\"\",\"\",\"\\n\"]}]}"},{"name":"data-parsoid","value":"{\"tsr\":[0,356],\"src\":\"{{שחקנים
\\n|שם=דניאל רדקליף
\\n|תמונה=[[קובץ:Daniel Radcliffe Paris 2012.jpg|200px]]\\n|כיתוב=<small> דניאל רדקליף, [[2012]]</small>\\n|תאריך לידה=23 ביולי 1989\\n|מקום לידה=[[לונדון]], [[אנגליה]], [[הממלכה המאוחדת]]\\n|כינוי=דן\\n|דמות=[[הארי פוטר (דמות)|הארי פוטר]]\\n|בכורה=בסדרה [[דייוויד קופרפילד]], 1999\\n|אתר אינטרנט=http://www.danradcliffe.co.uk/\\n|קישור=0705356\\n}}\",\"tagId\":1}"}]}-->
<table class="infobox" style="width:18em; font-size:85%;" cellspacing="5" about="#mwt1" data-parsoid="{}">[....]</table>
<meta typeof="mw:Transclusion/End" about="#mwt1" data-parsoid="{&quot;dsr&quot;:[null,356,null,null]}">

Note how:

  • The transclusion starts with a comment (!) which tries to be in the #mwt1 about group
  • The transclusion continues with a table in that same about group
  • The about group ends with a meta tag in that same about group

This doesn't serialize cleanly at all, the template gets inserted into the wikitext and the table gets serialized as wikitext as well. Even the <meta> tag gets serialized: http://parsoid.wmflabs.org/_rt/he/%D7%93%D7%A0%D7%99%D7%90%D7%9C_%D7%A8%D7%93%D7%A7%D7%9C%D7%99%D7%A3

In VisualEditor, this causes:

  • corruption on save like in the diff Moriel linked
  • the user to be able to edit inside the infobox, because the comment is ignored and there are no attributes on the <table> telling VE it's part of a template (there's about="#mwt1" but that doesn't refer back to a valid about group, so it's interpreted as the start of an about group)

The problem here is that the contents of that comment isn't matching this regex,

/^\{.*\}$/.test( content )

https://github.com/wikimedia/mediawiki-extensions-Parsoid/blob/master/js/lib/mediawiki.DOMPostProcessor.js#L95

It's not recognizing the last character as }. Taking content.charCodeAt( content.length - 1 ) returns undefined. Probably because of the unicode characters. Switching to something like,

/^\{\"@type\"/.test( content )

seems to work but I'm interested to see where exactly things are going wrong.

On second thought, I think it's because . doesn't match newline chars.

Change 90687 had a related patch set uploaded by Arlolra:
Match all characters when testing comments for JSON

https://gerrit.wikimedia.org/r/90687

Change 90687 merged by jenkins-bot:
Match all characters when testing comments for JSON

https://gerrit.wikimedia.org/r/90687

  • Bug 56722 has been marked as a duplicate of this bug. ***