Page MenuHomePhabricator

Invalid unicode silently blanks page
Open, Needs TriagePublic

Description

Steps to reproduce:

  • Run the command (from cli): echo -e "bullet\x95bullet" | php edit.php unicode
  • Go to page named unicode

Expected behaviour:

One of the following: An exception (my preference), output the page with invalid unicode, or output the page with invalid unicode replaced with the replacement character.

Actual behaviour:

The page contents is treated as an empty string. (The skin and stuff renders fine, just the page contents are blank)


Perhaps there is a unicode regex somewhere that is returning false, which then gets cast to the empty string.

In my opinion, while we don't have to entirely behave sensibly in this situation, outputting nothing is very confusing, and the fact that we're outputting nothing probably means we're ignoring error handling somewhere. We should not ignore error handling, and instead give a big exception error instead.

Originally reported at https://www.mediawiki.org/wiki/Topic:Vi3gbx8m6gjq6v1k According to report this behaviour is new in 1.34 and not present in 1.33

Event Timeline

So issue appears to be in MagicWordArray:

$res = preg_match_all( $regex, $text, $matches, PREG_SET_ORDER );
if ( $res === false ) { 
        LoggerFactory::getInstance( 'parser' )->warning( 'preg_match_all returned false', [
                'code' => preg_last_error(),
                'regex' => $regex,
                'text' => $text,
        ] );
} elseif ( $res ) {
        foreach ( $matches as $m ) {
                list( $name, $param ) = $this->parseMatch( $m );
                $found[$name] = $param; 
        }
}
$res = preg_replace( $regex, '', $text );
if ( $res === null ) {
        LoggerFactory::getInstance( 'parser' )->warning( 'preg_replace returned null', [
                'code' => preg_last_error(),
                'regex' => $regex,
                'text' => $text,
        ] );
}
$text = $res;

So, see how it checks for the result being false, but if it does, it still overrides the entire string with false? This seems like a bad behaviour in my opinion. I think, either it should exit early in that situation (Thus not blanking the entire page), or better throw an exception.

That said, assuming original reporter is right that this only shows up in 1.34, I don't see any reason why this issue would be new.