Don't allow null characters in wikitext (or in HTML output)
Open, MediumPublic
Actions

Assigned To

None

Authored By

	cscott
	Feb 27 2017, 8:52 PM

Description

A recent patch (https://gerrit.wikimedia.org/r/327779) proposed to fix handling of the null character when present in language-converted text, to make it consistent with how null characters are handed when language converter is disabled.

@tstarling suggested a better solution would be to strip null characters entirely, whether language converter is enabled or disabled.

Indeed, the HTML5 spec frowns on null characters in HTML documents -- they are generally ignored or replaced with U+FFFD, and representing them via character entities is explicitly forbidden. It seems like good practice for the parser not to emit U+0000 in its generated output.

Details

	Subject	Repo	Branch	Lines +/-
	Strip U+0000 in wikitext	mediawiki/core	master	+12 -4

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T106079 Wikitext includes control characters that are not allowed in HTML 5
		Open		None	T159174 Don't allow null characters in wikitext (or in HTML output)

Event Timeline

cscott created this task.Feb 27 2017, 8:52 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 27 2017, 8:52 PM

Change 340225 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
T159174: Strip U 0000 in wikitext

https://gerrit.wikimedia.org/r/340225

gerritbot added a project: Patch-For-Review.Feb 27 2017, 9:41 PM

Change 340225 had a related patch set uploaded (by legoktm; owner: cscott):
[mediawiki/core] Strip U 0000 in wikitext

https://gerrit.wikimedia.org/r/340225

Change 340225 merged by jenkins-bot:
[mediawiki/core] Strip U 0000 in wikitext

https://gerrit.wikimedia.org/r/340225

ReleaseTaggerBot added projects: MW-1.29-release (WMF-deploy-2017-03-07_(1.29.0-wmf.15)), MW-1.29-release-notes.Mar 6 2017, 11:00 PM

ssastry closed this task as Resolved.Apr 9 2017, 9:53 PM

ssastry assigned this task to cscott.

ssastry triaged this task as Medium priority.

Actually, have to verify if this is handled in Parsoid correctly.

Krinkle removed a project: MW-1.29-release (WMF-deploy-2017-03-07_(1.29.0-wmf.15)).May 25 2017, 1:33 PM

Probably related to T106079 (at least the Parsoid portion of this).

cscott removed projects: MW-1.29-release-notes, Patch-For-Review, MediaWiki-Parser.Dec 1 2017, 9:22 PM

cscott mentioned this in T106079: Wikitext includes control characters that are not allowed in HTML 5.Feb 27 2018, 4:30 PM

ssastry moved this task from Needs Triage to Future Ideas on the Parsoid board.Jun 10 2019, 8:10 PM

cscott added a parent task: T106079: Wikitext includes control characters that are not allowed in HTML 5.Jun 18 2020, 6:25 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Don't allow null characters in wikitext (or in HTML output)Open, MediumPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Don't allow null characters in wikitext (or in HTML output)
Open, MediumPublic
Actions

Related Objects
Search...