Page MenuHomePhabricator

Port mediawiki/php/wmerrors to PHP7 and deploy
Open, HighPublic

Description

Before sending it to Cleanup, is the wmerrors PHP extension still useful? In T56163: Port wmerrors / php-fatal-error.html to HHVM it was decided that we didn't need this for HHVM, will we need it again for PHP 7? Or can it be safely archived?

Event Timeline

Legoktm created this task.Feb 13 2018, 3:54 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 13 2018, 3:54 AM
Krinkle renamed this task from Is mediawiki/php/wmerrors still useful? to Do we need mediawiki/php/wmerrors for PHP7.Mar 24 2018, 1:56 AM
Krinkle triaged this task as High priority.Apr 18 2018, 2:34 AM
Krinkle added a subscriber: Krinkle.EditedApr 26 2018, 5:16 PM

So for this task... a bit of research:

PHP 5: wmerrors

We have mediawiki/php/wmerrors for PHP 5 because PHP 5 does not expose fatal errors to user code at run-time. The reason we want to be able to register a callback handler for fatal errors is because the default behaviour of PHP 5 is to only log the message and source of the exception, not the stack trace. It logs it to the destination of error_log or syslog/rsyslog/logstash in our case.

The mediawiki/php/wmerrors extension (git: README) adds a handler to Zend directly for this in C, then performs the equivalent of debug_backtrace() or ( new Exception )->getTraceAsString(), and writes it to a configurable destination file (ini setting: wmerrors.log_file), and replaces the HTTP response with the contents of an HTML file (ini setting: wmerrors.message_file).

HHVM

HHVM has an extra INI setting hhvm.error_handling.call_user_handler_on_fatals (documentation that, when enabled (disabled by default) will invoke handlers registered via set_error_handler(), the same as PHP5/HHVM/PHP7 normally would for non-fatal errors (such as PHP notices and warnings).

We add a handler for this in wmf-config/HHVMRequestInit.php, which logs to syslog, which then gets picked up by Logstash as type:hhvm level:FATAL. Additionally, if MediaWiki PHP initialisation makes it to Setup.php, then MWExceptionHandler::installHandler() replaces the one from HHVMRequestInit with a slightly more elaborate one (source) that instead writes it in a way that ends up in Logstash as type:mediawiki channel:fatal.

Because of this replacement, in most cases, fatal errors from MediaWiki do not have stacktraces in the hhvm log, but instead only in the mediawiki/fatal log. (Because MediaWiki setting its handler, removes the one we add in HHVMRequestInit.) However, form cursory testing on mwdebug1001, I can confirm that the basic syslog handler does work for cases where MediaWiki doesn't initialise, in which case we'll actually see fatal+stacktraces in the hhvm log.

PHP 7

It pretty much depends on whether PHP 7 has the ability to let user-land code handle fatal errors. If it doesn't, and if its internal error handler also can't write stack traces to its log, then we'll need something like wmerrors again.

From a quick glance, it doesn't seem like set_error_handler has changed much between PHP 5 and PHP 7. However, I do see various suggestions around the internet that a shutdown function will get called even on fatal errors, e.g. via register_shutdown_function(). This is worth investigating.

We actually already use register_shutdown_function in MWExceptionHandler for catching OOM errors in HHVM.

From set_error_handler() docs:

The following error types cannot be handled with a user defined function: E_ERROR, E_PARSE, E_CORE_ERROR, E_CORE_WARNING, E_COMPILE_ERROR, E_COMPILE_WARNING, and most of E_STRICT raised in the file where set_error_handler() is called.

However, from my experimentation, a combination of register_shutdown_function() and error_get_last() appears to achieve what we want: https://3v4l.org/IcQdA

Anomie added a comment.EditedApr 26 2018, 8:52 PM

However, from my experimentation, a combination of register_shutdown_function() and error_get_last() appears to achieve what we want: https://3v4l.org/IcQdA

Confusing the issue is that some "fatal" errors in PHP 7 are actually thrown Error objects that are catchable and ignorable. The error you're triggering in that test is an example, which is why you're seeing the trace in the message.

https://3v4l.org/lfsUn and https://3v4l.org/tfpvM show some actual fatal errors.

Krinkle edited projects, added Performance-Team (Radar); removed Performance-Team.
Krinkle moved this task from Limbo to Perf issue on the Performance-Team (Radar) board.

The reason wmerrors is long and complicated is because PHP handled request timeout errors by trying to write out the error message from a signal handler. This was extremely unsafe and frequently crashed or deadlocked. In PHP 7.1, this issue was fixed: it now only calls signal-safe functions from the signal handler, setting a flag in the VM for graceful exit and setting a second "hard" timeout to abort the process if the VM doesn't terminate quickly enough.

I've confirmed in gdb just now that PHP 7.0 does indeed exit a signal handler using longjmp(). It then proceeds to run functions registered with register_shutdown_function(). I don't think it's advisable to do this since it will certainly crash or deadlock on occasion. In PHP 7.1 it should be possible to do this without generating segfaults.

The other tricky case is OOM errors. These still longjmp() directly out of the emalloc() call as of PHP git master. This leaves the calling code in an undefined state, but it's not as bad as timeouts since there are fewer possible calling locations. Maybe you could get away with running userspace code after this, as long as you keep the code very simple and avoid accessing any state which was being written at the time of the OOM. But in my testing of php-fpm, output from a shutdown function after OOM is discarded, so it's not possible to write out an OOM error page with this trick. This is probably because (gdb shows) a second OOM is inevitably triggered when you try to do anything in userspace. The memory usage is equal to the limit.

In summary, I think using a simple port of wmerrors would be advisable for PHP 7.0. wmerrors will routinely segfault but at least it has a timer to guard against deadlocks.

In PHP 7.1+, it should be possible to handle timeouts in userspace, but OOMs remain problematic.

In the long term, we should probably develop a patch for the PHP core which meets our needs for error reporting. We do have a contractor budget.

tstarling renamed this task from Do we need mediawiki/php/wmerrors for PHP7 to Port mediawiki/php/wmerrors to PHP7 and deploy.Jul 13 2018, 3:38 AM

@tstarling If I understand correctly from Joe and T176370, the the current aim is to use the default PHP 7.0 package during early internal experimentation, and PHP 7.2 (or 7.3) for the user-facing roll out.

That presumably means we can go with user-land handling, right? Regarding something basic, I suppose the existing transport via syslog for HHVM could be re-purposed for this. Something like HHVMRequestInit.php, but using the shutdown/trace logic from MWExceptionHandler.php, and using PHP7's auto-prepend to run it?

sietec added a subscriber: sietec.Nov 28 2018, 2:50 AM

Change 478575 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] errorpages: Remove unused php-fatal-error.html file

https://gerrit.wikimedia.org/r/478575

I checked a variety of fatal errors on both production HHVM and our current php-fpm configuration using the fatal-errors.php script from T210567. Here's what I found. (Disclaimer: I'm still relatively new to our error logging, so there may be something obvious that I missed.)

ErrorRuntimePost-sendStack Trace [1]Logstash ChannelLogfile(s)
No methodHHVMnoyesfatalfatal.log and hhvm.log
No methodHHVMyesyesfatalfatal.log and hhvm.log
No methodphp-fpmnonoexceptionexception.log
No methodphp-fpmyesnoexceptionexception.log
Out of memoryHHVMnoyesfatalfatal.log and hhvm.log
Out of memoryHHVMyesyesfatalfatal.log and hhvm.log
Out of memoryphp-fpmnoyes[2]fatalfatal.log and php7.2-fpm.log
Out of memoryphp-fpmyesyes[2]fatalfatal.log and php7.2-fpm.log
TimeoutHHVMnoyesfatalfatal.log and hhvm.log
TimeoutHHVMyesyesfatalfatal.log and hhvm.log
Timeoutphp-fpmnoyes[2]fatalfatal.log and php7.2-fpm.log
Timeoutphp-fpmyesyes[2]fatalfatal.log and php7.2-fpm.log
SegfaultHHVMnoyes<none>fatal.log and hhvm.log
SegfaultHHVMyesyes<none>fatal.log and hhvm.log
Segfaultphp-fpmnono<none>php7.2-fpm.log
Segfaultphp-fpmyesno<none>php7.2-fpm.log

[1] all stack traces appear in fatal.log only, not in hhvm.log / php7.2-fpm.log, which is fine
[2] the trace is to MWExceptionHandler::handleFatalError only, not to the code that actually caused the error, which isn't so fine

To summarize the differences that I noticed:

  • channels/logs changed for one error type (method not found, which is likely representative of a number of different PHP errors), and it lost its stack trace
  • stack traces for out of memory and timeout are much less useful under php-fpm
  • segfaults are not recorded in fatal.log under php-fpm. Although they are recorded in php7.2-fpm.log, they have no stack trace.

We identified this task as a likely candidate for an outside contractor to complete, based on current team structure and availability. The goal is to restore logging of stack traces that would be lost in a switch to PHP7 / php-fpm.

Change 478575 merged by jenkins-bot:
[operations/mediawiki-config@master] errorpages: Remove unused php-fatal-error.html file

https://gerrit.wikimedia.org/r/478575

Change 486840 had a related patch set uploaded (by MaxSem; owner: MaxSem):
[mediawiki/php/wmerrors@master] WIP: port to PHP 7.2

https://gerrit.wikimedia.org/r/486840

Change 486840 abandoned by MaxSem:
WIP: port to PHP 7.2

Reason:
Platform is not interested in this.

https://gerrit.wikimedia.org/r/486840

demon removed a subscriber: demon.Feb 19 2019, 10:30 AM

Still seeing fatals wrongly reported as regular (catchable) exceptions for PHP 7.2, which made T220623 rather confusing to debug.

Krinkle moved this task from Backlog to Wikimedia production on the PHP 7.2 support board.
Joe added a subscriber: Joe.Wed, Apr 17, 8:34 AM

Is this task resolved? If not, do we consider this a blocker for a full-scale deployment of PHP 7?

I am not aware of any resolution or progress on this task (beyond @MaxSem 's abandoned patch above).

I do not know if we consider this a blocker for PHP 7 deployment, but it seems to me worth doing. As @Krinkle mentions, the current situation is less than ideal and will continue to be problematic from time to time.