Page MenuHomePhabricator

Don't allow null characters in wikitext (or in HTML output)
Open, MediumPublic

Description

A recent patch (https://gerrit.wikimedia.org/r/327779) proposed to fix handling of the null character when present in language-converted text, to make it consistent with how null characters are handed when language converter is disabled.

@tstarling suggested a better solution would be to strip null characters entirely, whether language converter is enabled or disabled.

Indeed, the HTML5 spec frowns on null characters in HTML documents -- they are generally ignored or replaced with U+FFFD, and representing them via character entities is explicitly forbidden. It seems like good practice for the parser not to emit U+0000 in its generated output.

Event Timeline

cscott created this task.Feb 27 2017, 8:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2017, 8:52 PM

Change 340225 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
T159174: Strip U 0000 in wikitext

https://gerrit.wikimedia.org/r/340225

Change 340225 had a related patch set uploaded (by legoktm; owner: cscott):
[mediawiki/core] Strip U 0000 in wikitext

https://gerrit.wikimedia.org/r/340225

Change 340225 merged by jenkins-bot:
[mediawiki/core] Strip U 0000 in wikitext

https://gerrit.wikimedia.org/r/340225

ssastry closed this task as Resolved.Apr 9 2017, 9:53 PM
ssastry assigned this task to cscott.
ssastry triaged this task as Medium priority.
ssastry reopened this task as Open.Apr 9 2017, 9:56 PM
ssastry added a subscriber: ssastry.

Actually, have to verify if this is handled in Parsoid correctly.

cscott added a comment.Dec 1 2017, 9:22 PM

Probably related to T106079 (at least the Parsoid portion of this).

ssastry moved this task from Needs Triage to Future Ideas on the Parsoid board.Jun 10 2019, 8:10 PM
Aklapper removed cscott as the assignee of this task.Jun 19 2020, 4:21 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)