Page MenuHomePhabricator

Visual editor adds unwanted whitespaces at the end of section headings using french-spacing at each edit
Closed, ResolvedPublic

Description

Section (or subsection) headings on wikis are usually under the format === Heading 1 : 1 === with only one space between the text and the equal signs.

Step to reproduce:

  • Find an article that has a section whose heading contains the following sequence "space, colon, space".
  • Edit any section of it using the visual editor and save.

All section headings containing the given pattern, will have "gained" an extra space before the ending equal signs. If you do it again, another extra space appears, and so on. It seems that sometimes the same VisualEditor removes the spaces, probably when editing the top of the page.
Examples seen on various articles edited by different users on the French or English Wikipedias : this edit on frwiki and the next two ones, this other edit and the following four, this one on enwiki, etc.
Reproduced in a sandbox on enwiki : here and several tests between these versions. Spaces only removed in this edit.

This bug does not affect page rendering but can unnecessarily disturb the reader of the difference between two versions. I am not sure but haven't noticed such a behavior before August 2020.

Event Timeline

ssastry triaged this task as Medium priority.Aug 25 2020, 5:00 PM
ssastry added subscribers: cscott, Arlolra, ssastry.

I am not sure but haven't noticed such a behavior before August 2020.

Interesting. Probably a result of T197879 and possibly associated VE fixes that rolled in August. @Arlolra, @cscott FYI.

I'm not convinced french spacing is involved here, headings are not a context where they would be added. Seems like a different selser issue. I can investigate, though.

I can reproduce this locally. It happens with VisualEditor but not with DiscussionTools.

Reverting commit rEVEDed3579245e02: French spacing (mw:DisplaySpace) doesn't have mw:Placeholder any more actually fixes the issue. @cscott Do you prefer that we do that, or are you working on a fix on the Parsoid side?

Do you prefer that we do that, or are you working on a fix on the Parsoid side?

We're working on a fix.

The reason that patch is a problem is that we sent,

<h3 id="Heading_1_:_4" data-parsoid="{&quot;dsr&quot;:[0,21,3,3]}">Heading 1<span typeof="mw:DisplaySpace mw:Placeholder" data-parsoid="{&quot;src&quot;:&quot; &quot;,&quot;dsr&quot;:[13,14,0,0]}">&nbsp;</span>: 4</h3>

and are being returned,

<h3 id="Heading_1_:_4" data-parsoid="{&quot;dsr&quot;:[0,21,3,3]}">Heading 1<span typeof="mw:DisplaySpace" data-parsoid="{&quot;src&quot;:&quot; &quot;,&quot;dsr&quot;:[13,14,0,0]}">&nbsp;</span>: 4</h3>

Since the typeof changes, the heading is considered as "children-changed" and selser isn't used for it. There's probably a bug in preserving the trailing space after that's bypassed.

When https://github.com/wikimedia/parsoid/commit/275cb60e6949f2f6423233899330b38b8058d2a6 goes out, selser will kick in again since mw:Placeholder will no longer be emitted and we won't have that diff. However, any cached pages will still have that problem.

Change 628955 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Avoid double dip whitespace reuse when getting trailing space

https://gerrit.wikimedia.org/r/628955

There's probably a bug in preserving the trailing space after that's bypassed.

Yeah, see the patch above

So only a bug in headings which contain french spacing, most often a space before a colon. The mention of plain == Heading == in the original bug summary was throwing me off.

cscott renamed this task from Visual editor adds unwanted whitespaces at the end of some section headings at each edition to Visual editor adds unwanted whitespaces at the end of section headings using french-spacing at each edit.Sep 21 2020, 10:16 PM
cscott updated the task description. (Show Details)
cscott updated the task description. (Show Details)

Since the typeof changes, the heading is considered as "children-changed" and selser isn't used for it. [...] However, any cached pages will still have that problem.

Can we teach selser to ignore the mw:Placeholder on mw:DisplaySpace? That seems to be the html2wt compat case I missed in the original patch set.

Can we teach selser to ignore the mw:Placeholder on mw:DisplaySpace? That seems to be the html2wt compat case I missed in the original patch set.

Yeah, I imagine we need to add a $specializedAttribHandlers for "typeof", which doesn't seem too bad

Change 629169 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Add a manual selser test demonstrating double dip spacing

https://gerrit.wikimedia.org/r/629169

Change 629176 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Avoid modified-wrapper on mw:DisplaySpace

https://gerrit.wikimedia.org/r/629176

Change 629176 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Avoid modified-wrapper on mw:DisplaySpace

https://gerrit.wikimedia.org/r/629176

Change 629169 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Add a manual selser test demonstrating double dip spacing

https://gerrit.wikimedia.org/r/629169

Change 628955 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Avoid double dip whitespace reuse when getting trailing space

https://gerrit.wikimedia.org/r/628955

Arlolra claimed this task.

Change 630676 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump wikimedia\parsoid to 0.13.0-a11

https://gerrit.wikimedia.org/r/630676

Change 630676 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a11

https://gerrit.wikimedia.org/r/630676