Page MenuHomePhabricator

Undefined whitespace handling behavior around category and language links
Closed, ResolvedPublic

Description

Consider this wikitext `ABC [[Category:Foo]]DEF. The old parser in core effectively renders this as ABCDEF and Parsoid renders this as ABC DEF. This may just be an accidental side effect of other whitespace behavior around category links, but this difference is a source of visual diff noise on a large number of enwikivoyage pages.

See this on Parsoid vs this with the old parser. See the "By Plane" section and the IATA superscript after the DHM text.

EDIT: Updated task to include interlanguage links as well.

Event Timeline

This follows up on T2087: Category tags produce ugly whitespace, T87753: Space between final 2 words in a page with ≥2 category tags is removed in arabic mediawiki, and T174639: Recent replaceInternalLinks() patch causes line breaks to be stripped from an internal link following a category.

Currently we have the following renderings:

  • ABC[[Category:Foo]]DEF => ABCDEF
  • ABC[[Category:Foo]] DEF => ABC DEF
  • ABC [[Category:Foo]]DEF => ABCDEF unexpected
  • ABC [[Category:Foo]] DEF => ABC DEF

So non-newline whitespace on the left side of a category tag is ignored, but on the right side is not.

There are good reasons to treat *newlines* around category tags specially (T2087 and related) but the treatment of non-newline whitespace is arguably unexpected behavior. Current Parsoid renders the "unexpected" case as ABC DEF.

Either Parsoid should be fixed, or the core parser should be fixed; this is causing some visual diffs on wikivoyage in the output of {{go|...}} templates; for example this "By Plane" section and the DHM (IATA) output there.

Change 1010322 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Don't strip non-newline whitespace from left side of [[Category]] links

https://gerrit.wikimedia.org/r/1010322

cscott renamed this task from Undefined whitespace handling behavior around category links to Undefined whitespace handling behavior around category and language links.Mar 29 2024, 10:38 PM
cscott updated the task description. (Show Details)

Change #1015601 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Don't strip non-newline whitespace from left side of language links

https://gerrit.wikimedia.org/r/1015601

Change #1010322 merged by jenkins-bot:

[mediawiki/core@master] Don't strip non-newline whitespace from left side of [[Category]] links

https://gerrit.wikimedia.org/r/1010322

ssastry triaged this task as Medium priority.Apr 1 2024, 3:59 AM
ssastry moved this task from Code Review to To Deploy on the Content-Transform-Team-WIP board.

Change #1015601 merged by jenkins-bot:

[mediawiki/core@master] Don't strip non-newline whitespace from left side of language links

https://gerrit.wikimedia.org/r/1015601

No issues from the deploy.