Page MenuHomePhabricator

Carriage return (\r) line endings not being normalized before being passed to the preprocessor
Closed, ResolvedPublic

Description

This might be a WONTFIX, but we should at least decide whether or not this is desired behavior. Previously, when an edit was submitted via the API with \r characters as EOL markers, they were parsed as such. Recently, this changed and \r characters are now effectively ignored. Do we want to:

  1. Continue ignoring \r characters
  2. Convert them into \n in the editing workflow
  3. Convert them into \n in the parsing workflow

Details

Related Gerrit Patches:
mediawiki/extensions/Scribunto : masterNormalize newlines before calling preprocessor
mediawiki/core : masterNormalize "\r" newlines in preSaveTransform

Event Timeline

kaldari created this task.Dec 14 2014, 2:36 AM
kaldari raised the priority of this task from to Needs Triage.
kaldari updated the task description. (Show Details)
kaldari added a project: MediaWiki-Parser.
kaldari changed Security from none to None.
kaldari added a subscriber: kaldari.

Before null edit:

After null edit:

GOIII added a subscriber: GOIII.Dec 14 2014, 8:01 PM

Looks like something called HotArticleBot keeps fudging-up the wiki mark-up's table syntax.

Note the |- ''table row'' tag keeps getting lumped in with end of the previous line instead of being on its own line. It just looks like it is starting its own line in the revision diff but obviously its not.

GOIII removed a subscriber: GOIII.Dec 14 2014, 8:04 PM

Here's an example of a currently misrendered page

Has this problem been only seen on en.wikipedia or also on other sites?
Sounds on-wiki according to last comment?

Text contains \r which is not rendered correctly.
Someone should inform the bot maintainer to change this and mediawiki should reject or replace with \n when doing a (api) edit?

It was working fine previously with \r (since September 2013). None of the bot code has changed. Something in our API or editing code must have changed recently. I'll retitle the bug...

kaldari renamed this task from Tables misrendering, fixed by null edit to Carriage return character (\r) no longer being converted to a newline when used in API edit submissions.Dec 15 2014, 7:54 PM
kaldari updated the task description. (Show Details)
Anomie added a subscriber: Anomie.Dec 15 2014, 10:14 PM

It turns out this has nothing to do with the API. What changed recently is that, along with the change from Zend PHP to HHVM, we also changed from using Preprocessor_DOM to Preprocessor_Hash. And it turns out that the former replaces \r with \n, while the latter does not.

Anomie claimed this task.Dec 15 2014, 10:14 PM
Anomie added a project: MediaWiki-Core-Team.
Anomie moved this task from Backlog to Needs Review/Feedback on the MediaWiki-Core-Team board.

Change 180021 had a related patch set uploaded (by Anomie):
Normalize "\r" newlines in preSaveTransform

https://gerrit.wikimedia.org/r/180021

Patch-For-Review

Change 180022 had a related patch set uploaded (by Anomie):
Normalize newlines before calling preprocessor

https://gerrit.wikimedia.org/r/180022

Patch-For-Review

Anomie renamed this task from Carriage return character (\r) no longer being converted to a newline when used in API edit submissions to Carriage return (\r) line endings not being normalized before being passed to the preprocessor.Dec 15 2014, 10:15 PM

@Anomie: Rather than modify both the PST and Scribunto (and probably other places in the future), why not just change Preprocessor_Hash to have the same behavior as Preprocessor_DOM?

@Anomie: Rather than modify both the PST and Scribunto (and probably other places in the future), why not just change Preprocessor_Hash to have the same behavior as Preprocessor_DOM?

I considered that, but since we already have partial newline normalization in Parser::preSaveTransform I decided to just fix that.

Change 180021 merged by jenkins-bot:
Normalize "\r" newlines in preSaveTransform

https://gerrit.wikimedia.org/r/180021

Anomie closed this task as Resolved.Dec 16 2014, 9:24 PM
Anomie moved this task from Needs Review/Feedback to Done on the MediaWiki-Core-Team board.

Change 180022 merged by jenkins-bot:
Normalize newlines before calling preprocessor

https://gerrit.wikimedia.org/r/180022

bd808 moved this task from Done to Archive on the MediaWiki-Core-Team board.Dec 22 2014, 10:39 PM