Page MenuHomePhabricator

RFC: drop support for running without mbstring
Closed, ResolvedPublic

Description

MediaWiki currently relies heavily on Unicode support to provide support for 300+ languages yet does not require the mbstring PHP extension to function. Instead, we create PHP-only fallbacks if a native support is not available. This creates a few problems:

  • These fallbacks are extremely slow. The script in P2734 demonstrates that fallbacks are roughly order of magnitude slower on PHP 5.6. In extreme cases, it can be 100+ times slower, per comment in Fallback.php).
  • These fallbacks cover only a few functions. If there's no fallback, either ad-hoc solutions are used in places, or, like in SwiftFileBackend, we just say "mbstring is required".
  • This also means that extensions can't expect any consistent Unicode support.
  • Won't somebody please think of the children!

Now that we've dramatically increased PHP requirements, we've already cut off a lot of crappy environments so this change will likely not affect too many users.

OS support:

  • On Debian-based systems, a simple apt-get install php5 gives you mbstring by default.
  • On RPM-based, a separate package is required
  • On Windows, people tend to use *AMP all-in-one packages that have mbstring.

Current mbstring usage in core (excluding fallbacks themselves):

mediawiki/includes$ grep -orEh '\bmb_\w+' . | sort | uniq -c
   7 mb_check_encoding
   6 mb_convert_encoding
  12 mb_strlen
   4 mb_substr

Some time ago, I committed https://gerrit.wikimedia.org/r/#/c/267309/ to start a discussion, but it went largely unnoticed so I'd like to start a formal RFC.

Event Timeline

Can we stop maintaining our own fallbacks and just use https://github.com/symfony/polyfill-mbstring ?

Are they any faster (slowness is mentioned as one of the reasons to remove them)?

Another reason in favor for requiring mbstring is that we require i18n (it's not optional) and mbstring is a piece of code that enables that.

It is already required by SemanticMediaWiki.

It's unfortunate that mbstring is not always installed with PHP by default.

@Anomie writes on wikitech-l:

Users of Debian-based distributions or Windows should have mbstring
enabled by default, so should not be affected.

I note that the current packages in Debian unstable for PHP7 have a
separate mbstring package for some reason.

Also, as a side note, once I got around to figuring out everything that
needed installation[1] my local dev environment hasn't had much issue with
PHP7. The biggest problem is that luasandbox needs updating.

[1]: IIRC that's php-curl, php-intl, php-json, php-mbstring, php-mysql,
php-sqlite3, php-tidy, php-xml, php-redis, and php-xdebug, in addition to
the usual basic packages.

So far it sounds like there is a good amount of support for requiring mbstring.

Does everybody here agree with the assessment that the number of users affected should be limited? Are there any other strong objections?

As a heavy "third-party MediaWiki operator", I am fine with requiring mbstring. It's standard, in all packaging systems, easy to enable, and seems a low cost for a high payoff.

Considering the broad support so far, @MaxSem and myself agreed to start the Final Comment Period, aiming for a decision in next week's ArchCom meeting. I will send out an announcement as part of the weekly ArchCom RFC update.

Change 267309 had a related patch set uploaded (by MaxSem):
Kill mbstring fallbacks

https://gerrit.wikimedia.org/r/267309

RobLa-WMF mentioned this in Unknown Object (Event).Apr 5 2016, 11:21 PM

Change 267309 merged by jenkins-bot:
Kill mbstring fallbacks

https://gerrit.wikimedia.org/r/267309

This RFC was accepted by the ArchCom in today's session based on broad support in the discussion. @MaxSem's patch has already been merged.

Thanks to @MaxSem for writing this RFC and path, and everybody else for participating!