Page MenuHomePhabricator

#expr error doesn't show non-ascii glyphs
Closed, ResolvedPublic

Description

Do the wikitext: {{#expr:ģ}}

Expected behaviour:

Expression error: Unrecognized punctuation character "ģ".

Actual behaviour:

Expression error: Unrecognized punctuation character "�".

Maybe something is converting from iso-8859-1?


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 4:00 AM
bzimport set Reference to bz72913.
bzimport added a subscriber: Unknown Object (MLST).

Is there any problem with Validator:cleanup() function in the core which converts non ascii character normal to C?

Is there any problem with Validator:cleanup() function in the core which converts non ascii character normal to C?

No.

Its because on line 200 of Expr.php, $char = $expr[$p];, the expression parser walks through the expression a byte at a time. In the case of a multibyte unicode character (including 'LATIN SMALL LETTER G WITH CEDILLA' (U+0123)), this will only look at the first byte, so the error message includes only the byte 0xC4 which is an invalid unicode code sequence, so the Validator::cleanUp() method correctly converts this to a replacement sequence.

One possible fix would be on line 336 to replace Validator::cleanUp( $char ) with Validator::cleanUp( mb_substr( $expr, $p, 1 ) )

thankyou @Bawolff presently it is executing correctly in my localhost can I make a patch of it in Gerrit?

Change 407596 had a related patch set uploaded (by Rammanojpotla; owner: Rammanoj):
[mediawiki/extensions/ParserFunctions@master] Enable non-ascii letters in expression error

https://gerrit.wikimedia.org/r/407596

Change 407596 merged by jenkins-bot:
[mediawiki/extensions/ParserFunctions@master] Enable non-ascii letters in expression error

https://gerrit.wikimedia.org/r/407596

Bawolff assigned this task to Rammanojpotla.