Page MenuHomePhabricator

Space before colon in French is wrapped in mw:Placeholder wrapper
Closed, ResolvedPublic

Description

Go to https://rest.wikimedia.org/fr.wikipedia.org/v1/page/html/Anne_Delb%C3%A9e , search for " :" (space colon) and inspect. Observe that the space is wrapped in a span and converted to an nbsp: <span id="mwXX" typeof="mw:Placeholder">&nbsp;</span> (you probably won't see &nbsp; in the DOM inspector, but $0.textContent.charCodeAt(0) returns 160 instead of 32). These spans get alienated in VE, because they don't have a useful type, which makes for a very strange editing experience. This also breaks our whitespace detection for link text, see T92896.

mw:Placeholder is very unhelpful. Ideally I would prefer that these spaces just be plain text, but for round-trip reasons I understand that's unlikely to work and using an mw:Entity wrapper is probably more realistic. With an mw:Entity wrapper we'd still have to work around this issue in the link text whitespace code in VE, but at least these spaces wouldn't be alienated any more.

Event Timeline

Catrope raised the priority of this task from to Needs Triage.
Catrope updated the task description. (Show Details)
Catrope added projects: Parsoid, Parsoid-DOM.
Catrope subscribed.

Ah, the weird displayHack markup .. (https://github.com/wikimedia/parsoid/blob/master/lib/pegTokenizer.pegjs.txt#L1914-L1921) .. I remember being baffled when I saw it there, but I suspect this exists specifically for frwiki. We could probably replace that with a &nbsp; to get the same effect. Will investigate that.

ssastry set Security to None.
ssastry moved this task from Needs Triage to In Progress on the Parsoid board.

I just tried using normal whitespaces and (besides a few changing selser blacklist results) everything seems to work OK. There's even one test for T71950 that now passes.
Changing the mw:Placeholder to mw:Entity should also work, but it triggers a lot of false failing tests due to differences in whitespace, which we don't normalize out yet.

Plain whitespace breaks rendering expectations -- whitespace is required in the rendering, as far as I understand. An issue for frwiki, for ex.

Change 205838 had a related patch set uploaded (by Marcoil):
WIP: T94509: Use mw:Entity for nbsp before colon

https://gerrit.wikimedia.org/r/205838

On IRC discussions, it was argued that mw:Entity wasn't the best typeof to use in this case. A fer other possibilities were discussed, including:

  • Create a new typeof like mw:DisplayHack, mw:SpaceBeforeColon, etc.
  • Add a new typeof to an exiting one, i.e. typeof="mw:Placeholder mw:DisplayHack".
  • Just have VE not treat this case as a mw:Placeholder and allow the user to delete them.

Any preference? Any other possibilities?

Apr 27 10:38:13 <gwicke>        the behavior we want is basically what mw:Placeholder provides, with no way to edit inside, but the option of deleting / replacing the content while removing mw:Placeholder as well
Apr 27 10:40:23 <edsanders>     subbu, isn't it just the reason he gives on the bug? Placeholders are alienated and so the editing experience is weird
Apr 27 10:44:52 <gwicke>        edsanders: and you plan to not alienate the new type & allow editing the wrapped space?
Apr 27 10:45:15 <edsanders>     I plan to ask RoanKattouw_away what his plan was
Apr 27 10:45:37 <edsanders>     but that sounds sensible
Apr 27 10:50:45 <gwicke>        I don't think that would make sense from a Parsoid perspective
Apr 27 10:52:08 <gwicke>        you don't want to send back random content marked as 'mw:Nbsp' or the like
Apr 27 11:32:34 <James_F>       gwicke: But it wouldn't be.
Apr 27 11:32:58 <James_F>       gwicke: If the user deletes it and replaces it with something else, it'll be plain unannotated content.

See IRC discussion above. I think the plan is for us to add a new typeof, and on their end, VE will make sure they will handle it properly (including removing the annotation on edits so as not send back arbitrary content with the new typeof). I think what we need to figure out is if VE guarantees that if Parsoid receives a mw:<new-type-of> on a node, then that node only wraps a space.

Via IRC: confirmation that we can proceed with a special type for this and always serialize it to a space without examining contents.

<subbu> On the Parsoid end, the plan is that we'll serialize that special type to a space.
<RoanKattouw> Sounds sensible to me
<RoanKattouw> We are already able to do this for mw:Entity AIUI

What's the special type for it? mw:DisplaySpace?

Change 205838 merged by jenkins-bot:
T94509: Add mw:DisplaySpace to typeof for nbsp before colon

https://gerrit.wikimedia.org/r/205838

ssastry removed a project: Patch-For-Review.
ssastry removed a subscriber: gerritbot.