Page MenuHomePhabricator

Create a way to intentionally trigger fatal errors in MediaWiki
Closed, ResolvedPublic

Description

It is useful to be able to intentionally trigger fatal errors in MediaWiki, to confirm error/logging behavior. Create a way to do this on production in a secure and controlled way. This will be useful for T187147, but is also generally useful going forward.

Suggested implementation: create a script in wmf/operations/mediawiki-config/w for this purpose. Fully load MediaWiki via WebStart.php. Loading in this way includes configuration and installs error handlers, which may be relevant. The existing extract2.php script in that directory is a good example of how to load in this way.

The script should recognize two parameters, one for indicating the type of fatal error to trigger, and another for indicating whether the error should trigger normally or via a post-send function. (Some error behaviors may differ when executed in a post-send context than in a normal context.)

The types of fatal errors the script can produce should include at least:

  • method does not exist
  • out of memory
  • timeout
  • segfault

Additionally, the script should accept a password parameter, which it checks against a password stored in either PrivateSettings.php or (perhaps better) a dedicated file in the same private directory as PrivateSettings.php.

Event Timeline

Additionally, the script should accept a password parameter, which it checks against a password stored in either PrivateSettings.php or (perhaps better) a dedicated file in the same private directory as PrivateSettings.php.

I stumbled upon this problem (protect sensitive scripts from outside reach) when working on the php admin interface last week.

A couple suggestions:

  • create a new directory under operations/mediawiki-config/ where this script, and potentially any other such script can reside. This way, it will be easier to restrict access from the normal virtual host and to create an ad-hoc one for such requests.
  • We can inject the shared secret as an env variable or a header from apache, instead of having to save it in privatesettings.php. But I see advantages with either approach.

*IF* you don't add the ability to segfault the application server, I'm even ok with allowing direct unauthenticated access to this for now.

Maybe I'm missing something, but how do you intentionally cause PHP to segfault? Shouldn't any code that triggers a segfault be a PHP bug that should/would be reported upstream?

Maybe I'm missing something, but how do you intentionally cause PHP to segfault? Shouldn't any code that triggers a segfault be a PHP bug that should/would be reported upstream?

We don't really need to really cause a segfault. We just need to send SIGSEGV to the process we want to "segfault", that would activate any segfault handlers the process might have.

Maybe I'm missing something, but how do you intentionally cause PHP to segfault? Shouldn't any code that triggers a segfault be a PHP bug that should/would be reported upstream?

You can trivially cause a segfault with:

function foo() {
   array_map('foo', [0]);
}
foo();

This is well known to the PHP devs. It used to be that any infinite recursion would cause a segfault, and they didn't consider that a bug either. But now there is a virtual stack for the VM so it's necessary to call into C code and then back into PHP e.g. using array_map(). Nikita Popov mentioned this behaviour in http://nikic.github.io/2017/04/14/PHP-7-Virtual-machine.html#function-calls

A first draft of this script is mostly finished and I'm testing locally. A few notes:

  • I included a "no error" action, which simply executes the script without triggering a fatal error. This is to serve as a baseline for confirming operation of the script, and is also friendlier if any debugging of the protection mechanism is necessary - we can execute the script and confirm the protection is functional without actually causing a fatal error.
  • On my local (running PHP 7.2 and fpm-cgi), in a normal context (not post-send), fatal errors for method does not exist, timeout, and out of memory all display reasonable messages to the browser. Segfaults display a "Service unavailable" error to the browser, and are recorded in syslog.
  • Also on my local, in a post-send context, segfaults were recorded in syslog but I could find no trace of oom, timeout, or method does not exist. Maybe it was somewhere I didn't look, or maybe it was simply not recorded.
  • enabling catch_workers_output in /etc/php/7.2/fpm/pool.d/www.conf caused oom and timeout errors to appear in /var/log/php7.2-fpm.log. I still don't see the method does not exist error in any log I've checked.

Change 477450 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[operations/mediawiki-config@master] Create script to intentionally trigger fatal errors in MediaWiki

https://gerrit.wikimedia.org/r/477450

  • create a new directory under operations/mediawiki-config/ where this script, and potentially any other such script can reside. This way, it will be easier to restrict access from the normal virtual host and to create an ad-hoc one for such requests.

Since we want to trigger an error with MediaWiki fully initialised, it's necessary to either use a virtual host which corresponds to an actual wiki, or to have a special case in the code which maps the hostname to the DB name.

  • We can inject the shared secret as an env variable or a header from apache, instead of having to save it in privatesettings.php. But I see advantages with either approach.

You mean have Apache authenticate the request? I think what @BPirkle has done in the code he has uploaded already is simpler, and good enough.

  • create a new directory under operations/mediawiki-config/ where this script, and potentially any other such script can reside. This way, it will be easier to restrict access from the normal virtual host and to create an ad-hoc one for such requests.

Since we want to trigger an error with MediaWiki fully initialised, it's necessary to either use a virtual host which corresponds to an actual wiki, or to have a special case in the code which maps the hostname to the DB name.

There are tricks that should allow us to fix that issue, but for now an authenticated endpoint under normal wikis is good enough.

  • We can inject the shared secret as an env variable or a header from apache, instead of having to save it in privatesettings.php. But I see advantages with either approach.

You mean have Apache authenticate the request? I think what @BPirkle has done in the code he has uploaded already is simpler, and good enough.

Ack, sounds fine!

A first draft of this script is mostly finished and I'm testing locally. A few notes:

  • I included a "no error" action, which simply executes the script without triggering a fatal error. This is to serve as a baseline for confirming operation of the script, and is also friendlier if any debugging of the protection mechanism is necessary - we can execute the script and confirm the protection is functional without actually causing a fatal error.
  • On my local (running PHP 7.2 and fpm-cgi), in a normal context (not post-send), fatal errors for method does not exist, timeout, and out of memory all display reasonable messages to the browser. Segfaults display a "Service unavailable" error to the browser, and are recorded in syslog.
  • Also on my local, in a post-send context, segfaults were recorded in syslog but I could find no trace of oom, timeout, or method does not exist. Maybe it was somewhere I didn't look, or maybe it was simply not recorded.
  • enabling catch_workers_output in /etc/php/7.2/fpm/pool.d/www.conf caused oom and timeout errors to appear in /var/log/php7.2-fpm.log. I still don't see the method does not exist error in any log I've checked.

@BPirkle I think this ticket is partially tied to T211184. If at all possible, we should avoid using catch_workers_output as it is known to cause quite serious performance penalties at high concurrency. If needed I can extract some numbers on how severe that penalty is, but we should try to avoid it at almost any cost, if possible.

Change 477450 merged by jenkins-bot:
[operations/mediawiki-config@master] Create script to intentionally trigger fatal errors in MediaWiki

https://gerrit.wikimedia.org/r/477450

@tstarling, in IRC @Joe raised a concern that the current method of transmitting the password would cause it to being revealed in logging, which I had not considered. He suggested http basic auth as an alternative. Unless you have another idea, I'm planning to look into that possibility tomorrow.

Maybe I'm missing something, but how do you intentionally cause PHP to segfault? Shouldn't any code that triggers a segfault be a PHP bug that should/would be reported upstream?

You can trivially cause a segfault with:

function foo() {
   array_map('foo', [0]);
}
foo();

This is well known to the PHP devs. It used to be that any infinite recursion would cause a segfault, and they didn't consider that a bug either. But now there is a virtual stack for the VM so it's necessary to call into C code and then back into PHP e.g. using array_map(). Nikita Popov mentioned this behaviour in http://nikic.github.io/2017/04/14/PHP-7-Virtual-machine.html#function-calls

TIL, thanks.

Closing this task, because the script is deployed and functioning as designed.

On password exposure via logging, @tstarling pointed out on the patch that posting the password via curl is a functional alternative. If we still feel we need http basic auth, we can do it under a separate task.