Page MenuHomePhabricator

Analyze newline dirty diffs introduced by Parsoid around <translate> tags for rendering impacts
Closed, ResolvedPublic

Description

Right now, Parsoid does not guarantee no-dirty-diffs for any whitespace that is primarily syntactic sugar. Parsoid has treated newlines before/after translate opening and closing tags as primarily syntactic sugar and as such can add/remove newlines in a wikitext -> wikitext roundtrip.

However, it turns out that not all newlines before/after opening/closing tags are syntactic sugar. @Nikerabbit says:

Translate determines whether a unit is so called "inline" unit by checking if there are newlines inside it. Inline units may be wrapped using span tags to mark outdated translations or fallback translation. For block units we use divs.

But, all is not lost. As long as Parsoid doesn't add newlines in originally-inline-translate blocks ( <translate>foo</translate>) and does not remove all newlines from originally-block-translate blocks (<translate>\nfoo<translate> or any of those incarnations with a newline within the unit), i.e. if it doesn't change an inline unit to a block unit or vice versa, we can still get away with treating newlines as syntactic elements for our purposes.

This would require an analysis of the code and/or rt-testing diffs to verify that all is kosher. If it turns out Parsoid's html->wt code changes the translation-unit type and we cannot fix that, then we will have to add additional information in the wt->html direction (in an editable propert - either in data-mw or in some other HTML attribute) that we can then use in the html->wt direction to preserve that information.

Fingers crossed it is the former. :-)

Related Objects

StatusSubtypeAssignedTask
OpenReleaseNone
OpenNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedEsanders
OpenFeatureNone
Resolvedihurbain
Resolvedihurbain
Resolvedihurbain

Event Timeline

The good news is that, while I've fumbled with some newlines stuff around <translate> regions, I remember explicitly fumbling with the new lines before and after the region, and hopefully not too much in the middle of it. At the time, it was because it "looked better" that way, but it seems it may have been the right call for other reasons.

I _think_ (but obviously this needs to be checked carefully) that we should be good - what might happen would be <translate>\nfoo</translate> getting transformed into <translate>\nfoo\n</translate> - but as I understand it this would actually be okay. <translate>foo</translate> round-trips correctly on wt2wt, although more elaborate constructions also need to be tested.

Closing this one because the analysis has been done, and more specific bugs have been filed.