Page MenuHomePhabricator

VE removes space in wikitext in list elements starting with wikilinks
Closed, ResolvedPublic

Description

As originally reported in https://de.wikipedia.org/wiki/Wikipedia:Technik/Text/Edit/VisualEditor/R%C3%BCckmeldungen#Leerzeichen_fehlt:_*Liste :

When editing list elements starting with a wikilink, VE removes the space between "*" and "[[":
https://de.wikipedia.org/w/index.php?title=Benutzer:Tkarcher/Spielwiese&diff=next&oldid=196793725

(Expected wikitext output would be * [[Test]], not *[[Test]])

Event Timeline

JTannerWMF added a project: Parsoid.
JTannerWMF subscribed.

Tagging Parsoid for visibility.

LGoto triaged this task as Low priority.Mar 13 2020, 4:12 PM
LGoto moved this task from Needs Triage to Bugs & Crashers on the Parsoid board.

Please look at this diff.

In list 1, leading space have been removed each time I have formatted the first word of the list item.
Note that, when I remove formatting of the first word, the leading space remains.

In list 2, I have only formatted the first word of one item and added a new paragraph before the list. Results: all items which had their first word formatted lose their eventual leading space.

This issue in list 2 may cause unexpected big diff: example on fr.wp (look at diff after == Œuvre == addition).

ssastry raised the priority of this task from Low to Medium.Jun 8 2020, 10:59 PM

Change 604035 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Preserve leading space, even for non-text nodes

https://gerrit.wikimedia.org/r/604035

Please look at this diff.

Here's the same edit made today,
https://www.mediawiki.org/w/index.php?title=User:Arlolra/sandbox&type=revision&diff=3902422&oldid=3902417&diffmode=source

It's unclear why untouched lines have diffs.

The patch in T245206#6206247 fixes the cases from the expected diff today.

Are there steps to reproduce the other part?

Change 604035 merged by jenkins-bot:
[mediawiki/services/parsoid@master] html2wt: Newly inserted elements shouldn't disrupt whitespace heuristics

https://gerrit.wikimedia.org/r/604035

Please look at this diff.

Here's the same edit made today,
https://www.mediawiki.org/w/index.php?title=User:Arlolra/sandbox&type=revision&diff=3902422&oldid=3902417&diffmode=source

It's unclear why untouched lines have diffs.

The patch in T245206#6206247 fixes the cases from the expected diff today.

Are there steps to reproduce the other part?

Yes: to reproduce list2 bug, you must add a new paragraph just before the list (in addition to format first word of one element)

Yes: to reproduce list2 bug, you must add a new paragraph just before the list (in addition to format first word of one element)

Thanks!

VE seems to be duplicating the id when inserting that paragraph,

<p id="mwDg"></p><p id="mwDg">list 2 (all elements starts with space, some have first word formatted):</p>

which, in turn, makes Parsoid think the list is newly inserted,

<p data-parsoid='{"dsr":[389,461,0,0]}' data-parsoid-diff='{"id":15580374,"diff":["children-changed","subtree-changed"]}'></p><meta typeof="mw:DiffMarker/deleted" data-parsoid="{}"/><meta typeof="mw:DiffMarker/deleted" data-parsoid="{}"/><meta typeof="mw:DiffMarker/deleted" data-parsoid="{}"/><p data-parsoid='{"dsr":[389,461,0,0]}' data-parsoid-diff='{"id":15580374,"diff":["inserted"]}'>list 2 (all elements starts with space, some have first word formatted):</p><meta typeof="mw:DiffMarker/inserted" data-parsoid="{}"/>
<ul data-parsoid='{"dsr":[462,666,0,0]}' data-parsoid-diff='{"id":15580374,"diff":["inserted"]}'>

Continuing from the some of the discussion in https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/604035,

The space is considered a separator for an element node and emitSepForNode will restore it on its own. Without it, you end up with duplicated spaces.

we only reuse original separators in we're !$state->inModifiedContent,
https://github.com/wikimedia/parsoid/blob/dd05f9126175f6f4651f80b4f387fc33cb2c38ea/src/Html2Wt/Separators.php#L626

Change 605678 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a17

https://gerrit.wikimedia.org/r/605678

Change 605678 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to 0.12.0-a17

https://gerrit.wikimedia.org/r/605678

VE seems to be duplicating the id when inserting that paragraph,

Opened up a broader discussion for that in T256687

I’m not sure to understand what happened on this diff on French Wiktionary: the item starting with ''Contactée lost its leading space but not the following one…
Should I open a new task?

EDIT: another similar example (where the list haven’t been deliberately edited at all)