HomePhabricator

abstract utf8 validation fallback

Description

abstract utf8 validation fallback

Language class had a code snippet to verify whether a text is valid
UTF-8 though that could not be used from another place. The snippet use
mb_check_encoding() and fallback to some regex whenever mbstring is not
available.

  • introduce StringUtils::isUtf8() which is mostly code moved out of the language class.
  • Enhance regex readability by using an expanded regex (//x)
  • Made the regex to recognize longer sequences
  • Add some unit tests to the mbstring and the PHP native implementation
  • An optional second parameter can be passed to isUtf8() to force the use of our PHP implementation. This is used for unit testing.

Change-Id: I4cf4dfe2eb02f046db1726f4654ba649e01419f2

Details

Provenance
hasharAuthored on
Gerrit Code ReviewCommitted on Dec 12 2012, 11:24 AM
Parents
rMWd6817311c3e4: (minor) use wfDebugLog consistently.
Branches
Unknown
Tags
Unknown
ChangeId
I4cf4dfe2eb02f046db1726f4654ba649e01419f2

Event Timeline