Page MenuHomePhabricator

Parsoid fails to round-trip   embedded in html comments
Open, LowPublic

Description

This dirty diff: https://en.wikipedia.org/w/index.php?title=Toyota_Land_Cruiser&oldid=prev&diff=720847140
As reported here: https://phabricator.wikimedia.org/T96701#4423989

AFAIK Parsoid should be able to round-trip embedded entities in html comments just fine (we have a special escaping mechanism we use), and even if we didn't, selser should be able to prevent a change to an unedited part of the article.

So maybe two bugs? Or maybe a VE bug which is defeating Parsoid's normal mechanisms?

Event Timeline

cscott created this task.Jul 13 2018, 9:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 13 2018, 9:57 PM
ssastry moved this task from Backlog to html2wt on the Parsoid board.Jul 27 2018, 6:09 AM
ssastry triaged this task as Low priority.Nov 8 2018, 6:01 AM
Thryduulf added a comment.EditedNov 25 2019, 1:37 AM

In this edit Parsoid (presumably) replaced   in a reference name with a plain space (line 258 change block) when I made an unrelated change to the page (the addition of a parenthesis at line 289 was the only change I made, everything else is VE/Parsoid). It left the non-breaking spaces in the reference title alone.

I'm not sure if this is the same issue as this bug, but its the closest I could find (T96701 seems to be about entering the character on a Mac keyboard). Whether non-breaking spaces are useful or not in this situation, I don't think they should be silently changed.