HomePhabricator

Use Remex in Sanitizer::stripAllTags()
ddb4913f5362Unpublished

Authored by Catrope on Nov 14 2017, 10:22 PM.

Unpublished Commit · Learn More

Publishing Disabled: All publishing is disabled for this repository.

Description

Use Remex in Sanitizer::stripAllTags()

Using a real HTML tokenizer fixes bugs when < or > appear in attribute
values. The old implementation used delimiterReplace(), which didn't
handle this case:

> print Sanitizer::stripAllTags( '<p data-foo="a&lt;b>c">Hello</p>' );
c">Hello

We also can't use PHP's built-in strip_tags() because it doesn't handle
<?php and <? correctly:

> print strip_tags('1<span class="<?php">2</span>3');
1
> print strip_tags('1<span class="<?">2</span>3');
1

Bug: T179978
Change-Id: I53b98e6c877c00c03ff110914168b398559c9c3e

Details

Committed
Jdforrester-WMFNov 16 2017, 1:31 AM
Parents
rMW7980e38a8405: Move Sanitizer.php to includes/parser/
Branches
Unknown
Tags
Unknown
References
refs/changes/48/391348/3
ChangeId
I53b98e6c877c00c03ff110914168b398559c9c3e