Page MenuHomePhabricator

VE's whitespace normalization code in headings is being tripped up by html5 fallback ids in Parsoid's output for headings
Closed, ResolvedPublic1 Estimated Story Points

Description

Currently, the Visual Editor shows the first space before the paragraph’s title. If I remember well, it didn't do that previously and in my opinion, it shouldn't do anyway.

Examples:

== Tudnivalók ==

1.PNG (125×1 px, 20 KB)

==Tudnivalók ==

2.PNG (133×1 px, 20 KB)

Event Timeline

I can't reproduce this problem on the English or German Wikipedia, but I managed to reproduce it on French and Hungarian. It only happens for the first heading, but not any subsequent ones. How strange!

@Trizek-WMF Do you happen to know if you've seen this on the French Wikipedia before? Is it new behaviour?

Since it only looks like this when you're editing (which uses Parsoid HTML), but not when you're reading (which uses the old PHP parser HTML), it might be a Parsoid problem? @ssastry, do you know?

Deskana moved this task from To Triage to TR0: Interrupt on the VisualEditor board.
Deskana set the point value for this task to 1.

This is basically T157418: RFC: Make some aspects of Tidy's whitespace stripping behavior part of wikitext parsing "spec". On those pages where you were able to reproduce it, open with "?action=parsermigration-edit" .. It is likely reproducible in the right-hand column with RemexHTML.

So, feel free to opine in T157418 with your thoughts on the matter.

@ssastry I couldn't reproduce it using ?action=parsermigration-edit; both versions did not have the space before the section. Does this mean that this issue is unrelated to T157418?

@ssastry I couldn't reproduce it using ?action=parsermigration-edit; both versions did not have the space before the section. Does this mean that this issue is unrelated to T157418?

That is because I learnt that matmarex implemented a header-specific fix in the parser. See T157418#3849785 ... So, the generic issue is still T157418.

Thanks @ssastry! I've left a comment there about this.

ssastry renamed this task from VE shows the first space before paragraphs' titles to VE's whitespace normalization code in headings is being tripped up by html5 fallback ids in Parsoid's output for headings.Dec 21 2017, 4:05 PM
ssastry reopened this task as Open.

Summary from IRC discussion in #mediawiki-parsoid

Diagnosis:
(a) parsoid doesn't strip whitespace in its output (b) VE is stripping that whitespace, it appears (c) parsoid added html5 ids with fallback ids in some cases (deployed Dec 12) (d) the fallback id is interfering with (b) and causes leading whitespace to be displayed.

Solution:
(a) immediate term: fix the regression in VE with the introduction of fallback ids in parsoid output (b) long term: address T157418 in Parsoid & PHP Parser

In a nutshell, this extra whitespace appears in any section header that has diacritics in it. A fix in VE should help.

@Trizek-WMF Do you happen to know if you've seen this on the French Wikipedia before? Is it new behaviour?

Sorry I didn't replied earlier.
I see that on French Wikipedia on titles that have diacritics but I forgot to report it (I was editing as a volunteer).

Change 405759 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/VisualEditor@master] MWWikitextStringTransferHandler: Perform Parsoid cleanup on result

https://gerrit.wikimedia.org/r/405759

Change 405759 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] MWWikitextStringTransferHandler: Perform Parsoid cleanup on result

https://gerrit.wikimedia.org/r/405759

Deskana assigned this task to DLynch.