Page MenuHomePhabricator

Rendering difference between legacy and Parsoid with newlines after styling and <translate> tags
Open, LowPublicBUG REPORT

Description

Given

'''<translate>
test</translate>'''

When rendered with the legacy parser, "test" is in bold. When rendered with Parsoid, we get

<p data-parsoid='{"dsr":[0,34,0,0]}'><meta typeof="mw:Annotation/translate" data-parsoid='{"dsr":[3,14,null,null],"wasMoved":true}' data-mw='{"rangeId":"mwa0","extendedRange":true,"wtOffsets":[3,14]}'/><b data-parsoid='{"autoInsertedEnd":true,"dsr":[0,14,3,0]}'></b>
test<meta typeof="mw:Annotation/translate/End" data-parsoid='{"dsr":[19,31,null,null]}' data-mw='{"wtOffsets":[19,31]}'/><b data-parsoid='{"autoInsertedEnd":true,"dsr":[31,34,3,0]}'></b></p>

and "test" is not in bold.

Considering the behaviour without the <translate> tag (not in bold), it can be argued that the parsoid behaviour is the correct one here.

This is visible from the rt-testing on https://meta.wikimedia.org/wiki/Tech/News/2014/02 between bc2040ce and 89070d9e.

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ssastry added subscribers: Nikerabbit, ssastry.

Looks like the translate extension is stripping the `<translate>\n" there and causing the string on the following line to collapse which then causes the bold tag to take effect. @Nikerabbit is this expected or accidental behaviour?

Is there a reason not to decline this on the Parsoid end and have editors not add that unnecessary newline break?

Or, should Parsoid strip (all? one?) newlines / whitespace after <translate> and before </translate>? @ihurbain / we could read the translate source but you may have a quicker response here.

I believe we don't want to strip these newlines after/before because
<translate>
stuff
</translate>
has block semantics and <translate>stuff</translate> does not, and as far as I understood, that makes a difference for translate.

Hmm, an interesting edge case. To avoid whitespace build-up in the rendered output, when removing tags, Translate removes one newline (if present) after opening tag and one newline (if present) before closing tag. The removal happens after determining whether the content is block vs. inline.

Those two behaviors are intentional, but the combined behavior in a case like this... I don't think I can say it is intentional.

To make things more complicated, I think that before making the block vs. inline separation like it is today, the Translate extension would add a newline after the translation unit id comment, as in the examples. It does not do it these days, but I expect a lot of existing content has that kind of newline there.

Old Translate

Input:
"<translate>Bunny</translate>"

After marking:
"<translate><!--T:1-->
Bunny</translate"

Old parser sees:
"Bunny"

New Translate:

Input:
"<translate>Bunny</translate>"

After marking:
"<translate><!--T:1--> Bunny</translate>"

Old parser sees:
"Bunny"

Also, if that old Tech News issue used syntax version 2 (implemented in T254484) and there was an incomplete translation,

'''<translate><!--T:25-->
Problems</translate>'''

turned into

'''<div lang="en" dir="ltr" class="mw-content-ltr">
Problems
</div>'''

which is clearly broken wikitext. (Inline '''<translate><!--T:25--> Problems</translate>''' turns into inline '''<span lang="en" dir="ltr" class="mw-content-ltr">Problems</span>''', which is fine.)