Page MenuHomePhabricator

Line break in a link label was changed to a space
Open, LowPublic

Description

When this comment was posted with the Reply Tool [i], a line break within a link text in another comment [ii] was changed to a space.

Note: the comment contains a line break in the https://cs.wikipedia.org/wiki/Der_Spiegel link label.

Behavior

  1. Locate a comment that contains a line break (e.g. https://cs.wikipedia.org/w/index.php?title=Diskuse:Nobelova_cena_za_literaturu&diff=prev&oldid=19057190)
  2. Use the Reply Tool to respond to this comment (read: click the [ reply ] link appended to the comment in "Step 1") (e.g. https://cs.wikipedia.org/w/index.php?title=Diskuse:Nobelova_cena_za_literaturu&diff=prev&oldid=19057235)

Actual

  1. ❗️Inspect the diff produced by "Step 2" and notice the antecedent comment was changed

Expected

  1. ✅ Inspect the diff produced by "Step 2" and notice the only change was the content you posted in "Step 2" was added to the page

Done

  • The "Expected" behavior is implemented

i. https://cs.wikipedia.org/w/index.php?title=Diskuse:Nobelova_cena_za_literaturu&diff=prev&oldid=19057235
ii. https://cs.wikipedia.org/w/index.php?title=Diskuse:Nobelova_cena_za_literaturu&diff=prev&oldid=19057190

Event Timeline

matmarex renamed this task from Comment inherits content from antecedent to Line break in a link text was changed to a space.Oct 9 2020, 3:24 PM
matmarex updated the task description. (Show Details)
Esanders renamed this task from Line break in a link text was changed to a space to Line break in a link label was changed to a space.Oct 9 2020, 4:11 PM

The dirty diff part will be addressed by the parent task that will apply to all subtasks. But, outside the fact that it is a dirty diff, is the actual normalization a problem?

The dirty diff part will be addressed by the parent task that will apply to all subtasks.

Great.

But, outside the fact that it is a dirty diff, is the actual normalization a problem?

I don't understand it to be considering the normalization doesn't appear to affect how people will perceive the published state of the page and thus does nothing to change how people could understand the conversation.

I'm curious to hear whether Bartosz and Ed think similarly.

Yep, the rendering is unaffected.

This is unrelated to discussion tools or selser. In HTML -> Wikitext mode, Parsoid strips newlines from list items everywhere. But, I suppose newlines in wikilinks are okay. So, we can probably tweak those constraints.

Change 640725 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] html2wt: Newlines in wikilinks are okay in single-line context

https://gerrit.wikimedia.org/r/640725

Another interesting case, where the line break occurred in a HTML tag attribute: https://en.wikipedia.org/?diff=988198443

Change 640725 merged by jenkins-bot:
[mediawiki/services/parsoid@master] html2wt: Newlines in piped wikilinks are okay in single-line context

https://gerrit.wikimedia.org/r/640725

ssastry reopened this task as Open.

Change 641308 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a17

https://gerrit.wikimedia.org/r/641308

Change 641308 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a17

https://gerrit.wikimedia.org/r/641308

One more interesting case (hopefully the last one we'll see, as the patch is about to be deployed): https://fr.wikipedia.org/?diff=176864105 – a previous edit accidentally deleted half of a link syntax, and a DiscussionTools edit resulted in further corruption when line breaks were removed.

I looked at that one y'day. It won't be fixed by this patch. This is an edge case difference in Parsoid & core parser. Parsoid tokenizes it as a wikilink and mid-way after templates are expanded, it aborts and coverts the contents to text. But, by then, the ":" has been swallowed and Parsoid doesn't re-tokenize it. So, that change becomes permanent. Eventually, we may find cleaner solutions including changes to how templates are processed which may resolve this differently. But, for now, this is a known issue with "bailTokens" in WikiLinkHandler and unless it becomes a real issue, we are unlikely to fix it.

Change 653629 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Handle newlines in wikilinks for selser as well

https://gerrit.wikimedia.org/r/653629

Line breaks are probably not disallowed in signatures, you can't type them in in the Special:Preferences interface because it has a single-line text field, but you can probably save such a signature via an API. Alternatively, the user's signature could be substing some template containing their real signature, which can have any wikitext. Also, since the line break is before the character *, it could also be some accident involving substing and T14974.

Change 653629 merged by Subramanya Sastry:
[mediawiki/services/parsoid@master] cced5578 followup: Handle newlines in wikilinks for selser as well

https://gerrit.wikimedia.org/r/653629

Change 655482 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a22

https://gerrit.wikimedia.org/r/655482

Change 655482 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a22

https://gerrit.wikimedia.org/r/655482

This still appears to be happening today on wmf26: https://dtcheck.toolforge.org/dtcheck-2021-01-15.html

Ya, all those are more involved scenarios (the link is embedded inside a <sup> inside a list item) - Arlo and I discussed that on the patch that we may not handle scenarios like that but I should by now that if there is a more generalized check that is needed to capture all cases, there will be some instance of that somewhere on some wiki that will exercise that case... I could add the more general test but seems like just more complexity ... Alternatively, we can get some of those users to fix their signatures to avoid line breaks in CSS in HTML attributes. Anyway, will discuss this in our next meeting.

But, given that it isn't illegal to have newlines there, it seems like we should fix this in Parsoid.

Similar cases:

I think I also saw somewhere a dirty diff in markup like the below (using https://en.wikipedia.org/wiki/Template:Pb to write a multi-line comment), but I can't find the diff now, maybe I'm imagining it.

:::blah blah{{pb
}}blah blahblah blah

@ssastry: Removing task assignee as this open task has been assigned for more than two years - See the email sent to task assignee on Feburary 22nd, 2023.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!