Page MenuHomePhabricator

unicode blacklist check in User.php fails in php 4.3.2
Closed, DeclinedPublic

Description

Author: don

Description:
Installed mediawiki 1.6.1 on server with PHP 4.3.2. When trying to log in, I
get this PHP warning:

Warning: Compilation failed: characters with values > 255 are not yet supported
in classes at offset 33 in /usr/local/mediawiki-1.6.1/includes/User.php on line 224

I comment out the attempt check against the $unicodeBlacklist and the error goes
away. Earlier today I was told that this only works in PHP >= 4.4, but now am
told it *should* work in 4.3 as well.

I was going to write a patch to check the PHP version and conditionally do the
unicode check if PHP >= 4.4, but I'd like to know for sure that it won't work in
4.3 first.

I'm rizzo on freenode.


Version: 1.6.x
Severity: normal
OS: Linux
Platform: PC

Details

Reference
bz5496

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:09 PM
bzimport set Reference to bz5496.
bzimport added a subscriber: Unknown Object (MLST).

don wrote:

Googling around I found mention that the version of PCRE might have something to
do with it as well. The post I read is
http://drupal.org/node/12857#comment-35282, it turns out we have the same 4.3.2
with the same PCRE version as the poster.

phpinfo() says: PCRE Library Version 3.9 02-Jan-2002

It *should* work fine on 4.3.2...

http://us3.php.net/manual/en/reference.pcre.pattern.modifiers.php

"u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings
are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on
win32. UTF-8 validity of the pattern is checked since PHP 4.3.5."

don wrote:

Looks like it works after updating our pcre and recompiling the same version of PHP.

Perhaps mediawiki should make notice of a required minimum version of pcre?

robchur wrote:

(In reply to comment #3)

Looks like it works after updating our pcre and recompiling the same version

of PHP.

Perhaps mediawiki should make notice of a required minimum version of pcre?

What *was* the apparent minimum version of PCRE required?

I can confirm the problem, running PHP 4.3.8 with pcre-3.9-10; Avar told me he
does NOT have the problem running 4.3.2 with pcre-3.9-10.2;

So, the .2 seems to make all the difference, pcre-3.9-10.2 being the required
minimum version (since that's a patch release, required versions may have to be
determined for 3.9-11, etc, separately).

Please put something about this into the release notes!

don wrote:

(In reply to comment #4)

What *was* the apparent minimum version of PCRE required?

pcre-3.9-10.2

The blacklist can be rewritten like this:

		$unicodeBlacklist = '/' .
			'(\x00[\x80-\x9f])|' . # iso-8859-1 control chars
			'(\x00\xa0)|' .        # non-breaking space
			'(\x20[\x00-\x0f])|' . # various whitespace
			'(\x20[\x28-\x2f])|' . # breaks and control chars
			'(\x30\x00)|' .        # ideographic space
			'([\xe0-\xf8].)' .     # private use
			'/u';

that's not as nice, and I did not test it much, but it should work with earlier
versions of PCRE too. Please consider using this version, since the current one
is bound to cause problems for people running stuff on boxes with an old version
of PCRE - getting your hosting service to upgrade a library is not an easy task
in my experience. Speed and nicety is not critical for this expression.

That would fail as it only concerns itself with ASCII characters
and a few Latin Extended characters.

Daniel: where does this updated version of PCRE come from?
Did you just run your distro's standard updater or did you
get it from somewhere else?

Actually, I did not try the new pcre version, i just relied on what avar said.
Reading your comment, i updated pcre and got version 4.4-1, which does not have
the problem. I'm using apt-for-rpm on an old mostly-fedora-but-really-redhat
box, so i'm not really representative. I expect recent versions of pcre do not
have the problem, but people who have webspace on a "debian stable only" box or
something may be stuck with something old. I just think it's a bit pointles to
require a new version of pcre just for this simple check.

You said that my rewritten expression "only concerns itself with ASCII
characters and a few Latin Extended characters" - well, as far as I can see, it
does *exactly* the same as the expression currently in SVN:

		$unicodeBlacklist = '/[' .
			'\x{0080}-\x{009f}' . # iso-8859-1 control chars
			'\x{00a0}' .          # non-breaking space
			'\x{2000}-\x{200f}' . # various whitespace
			'\x{2028}-\x{202f}' . # breaks and control chars
			'\x{3000}' .          # ideographic space
			'\x{e000}-\x{f8ff}' . # private use
			']/u';

Am I missing something?

Yes, you're missing two hexadecimal digits. [\xe0-\xf8] refers to
characters in the range U+00E0 (à) through U+00F8 (ø).

Have you tested with Debian stable (that's 3.1, not old 3.0 which
has too old a PHP to run 1.6 anyway)?

I don't quite understand what you are saying, but looking at this again, I
notice I once more managed to ignore the difference between Unicode and UTF-8.
Oops...

Anyway: no, i have not tested with debian stable, I pulled that out of thin air.
Basically, since I still has that old version on my (ill maintained) box, others
will probably too - that's all. It's not a real issue to me.

php 4.3.2 support dropped in REL1_6:


r13843 | tstarling | 2006-04-24 17:30:28 +0200 (lun, 24 avr 2006) | 1 line

We no longer support PHP 4.3.2, thanks to the unicode character classes in

User::isValidUserName(). Support for this was added in 4.3.3.

I believe Tim added that note following your bug report. Please upgrade
your php4.x version. Debian stable got 4.3.10.

"Support for this was added in 4.3.3" - this is wrong. Support for this was
added in pcre-3.9-10.2, the version of PHP is not relevant (it was broken for me
in 4.3.8). Please update the notes accordingly.