Page MenuHomePhabricator

wfShellExec() causes web-installer (and possible more) to hang
Open, MediumPublic

Description

Trying to upgrade a 1.22 version of mediawiki to any of the 1.23 versions will on my shared host result in the following behaviour:

After entering the Upgrade Key and moving on to the "Welcome to MediaWiki"-page, the installer hangs and the page never finishes loading, that is you only see white.

With strategic outcommenting of code, we have narrowed the issue down to the wfShellExec()-Function, however no error was produced. Only when we extracted the offending piece of code where we able to get a warning message.

Test code and warning message: http://slexy.org/view/s2frnrvJCP
Hosteurope's (provider) PHP Version: http://www.hosteurope-infos.de/phpinfo.php


Version: 1.23.3
Severity: normal
URL: https://www.mediawiki.org/wiki/Thread:Project:Support_desk/wfShellExec()_causes_web-installer_(and_possibly_more)_to_hang_in_1.23.3

Details

Reference
bz70357

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:45 AM
bzimport set Reference to bz70357.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to Daniel from comment #0)

Test code and warning message: http://slexy.org/view/s2frnrvJCP

Contents:

Error message:

PHP Warning: stream_select() [[a href='function.stream-select']function.stream-select[/a]]: You MUST recompile PHP with a larger value of FD_SETSIZE.\nIt is set to 1024, but you have descriptors numbered at least as high as 1438.\n --enable-fd-setsize=2048 is recommended, but you may want to set it\nto equal the maximum number of open files supported by your system,\nin order to avoid seeing this error again at a later date. in /is/htdocs/wp1017571_H5Q0RI8Q57/www/sys/scripts/stream_select_debug.php on line 13

Test code:

<?php
$cmd = "/bin/bash '/is/htdocs/wp1017571_H5Q0RI8Q57/www/sfrs/hauptcomputer/w-1.23/includes/limit.sh' ''\''/usr/bin/diff3'\'' --version 2>&1' 'MW_INCLUDE_STDERR=;MW_CPU_LIMIT=180; MW_CGROUP='\'''\''; MW_MEM_LIMIT=0; MW_FILE_SIZE_LIMIT=102400; MW_WALL_CLOCK_LIMIT=180; MW_USE_LOG_PIPE=yes";
$desc = array(
0 => array('file', 'php://stdin', 'r'),
1 => array('pipe', 'w' ),
2 => array('file', 'php://stderr', 'w'),
3 => array( 'pipe', 'w' )
);

$readyPipes = null;
$proc = proc_open($cmd, $desc, $readyPipes);
$emptyArray = array();
var_dump(stream_select($readyPipes, $emptyArray, $emptyArray, null));

Apparently, seems to be a problem with hosteurope

See this similar report from dokuwiki:

https://bugs.dokuwiki.org/?do=details&task_id=2276

The problem lies with a hardcoded limit of a maximum of 1024 file descriptors
within the libc-client library. Because HostEurope uses mod_php which is shared
for all clients/apache processes this limit can arbitrarily reached. HostEurope
refuses to maintain a custom version of this library, so the limit can't be
increased.

Maybe you can try again in off-peak hours, when there are less concurrent users loading webpages on the server.

Dokuwiki "fixed" this by retrying in an infinite loop (yuk!)

(In reply to Kunal Mehta (Legoktm) from comment #1)

You MUST recompile PHP with a larger value of FD_SETSIZE.

So is there anything that can actually be done in MediaWiki?
Ticket sounds like INVALID to me and a support question instead.

Updates have always been done in off-peak hours, so that's not a solution. Even if it was, isn't wfShellExec() used in other scenarios? So I'd end up with a malfuncitoning installation, if I managed to update at night.

That said, the previous incarnation of wfShellExec() that used passthru() instead of stream_select() worked perfectly.

(In reply to Andre Klapper from comment #3)

So is there anything that can actually be done in MediaWiki?
Ticket sounds like INVALID to me and a support question instead.

The question was posted in Support Desk as the URL describes) and I convinced Daniel to post it here, in the hope that the issue could be investigated, in case there's some leak in file descriptors or something that could be done.

The fact that this error didn't happened on previous versions may indicate this could introduce a bug, or maybe something could be done to detect this and "fallback" to the old way of doing this.

I experienced this issue in another context and I have a partial resolution as a first step to a full resolution. Before speaking of the resolution, here is my issue (related to the reported issue).


Specific bug scenario:

I tried to run unit tests with PHPDBG 7.0.16 (see T147778) (my PHP 7.0 comes from Debian Stretch, I precise given this issue relates to a specific compilation parameter). I’m not sure if everyone experience this issue but when I launch unit tests with phpdbg -qrr tests/phpunit/phpunit.php I obtain tons of warning messages at the first test:

Warning: include(/mediawiki/languages/messages/MessagesCbk_zam.php): failed to open stream: Too many open files in /mediawiki/tests/phpunit/languages/SpecialPageAliasTest.php on line 52

To avoid these warnings I set ulimit -Sn 4096 and PHPUnit normally executes the tests. (ulimit -Sn is the maximum number of simultaneously open files, 1024 by default on my Debian Jessie).

I then experienced a more nasty issue with wfShellExec: when I launch all tests and a test uses wfShellExec it indefinitely stops and a zombie sh process is remaining, but when I only launch tests on the specific test class it works (e.g. DjVuTest). When it stops and I press Ctrl+C, I obtain PHPDBG command line. There the PHPDBG command ev $status is something like:

Array
(
    [command] => /bin/bash '/mediawiki/includes/limit.sh' ''\''/usr/bin/djvudump'\'' '\''/mediawiki/tests/phpunit/includes/media/../../data/media/LoremIpsum.djvu'\''' 'MW_INCLUDE_STDERR=;MW_CPU_LIMIT=180; MW_CGROUP='\'''\''; MW_MEM_LIMIT=307200; MW_FILE_SIZE_LIMIT=102400; MW_WALL_CLOCK_LIMIT=180; MW_USE_LOG_PIPE=yes'
    [pid] => 11772
    [running] => 1
    [signaled] => 
    [stopped] => 
    [exitcode] => -1
    [termsig] => 0
    [stopsig] => 0
)

and when I execute the PHPDBG command ev proc_get_status( $proc ), the zombie process is reaped with exitcode 0, and the PHPDBG command continue executes the next test with an error on this test. This error is:

stream_select(): You MUST recompile PHP with a larger value of FD_SETSIZE.
It is set to 1024, but you have descriptors numbered at least as high as 2267.
 --enable-fd-setsize=3072 is recommended, but you may want to set it
to equal the maximum number of open files supported by your system,
in order to avoid seeing this error again at a later date.

so it is the same as in comments above. The common thread is that "too much files are opened", be it because of the webserver or because of the internal file management of PHPDBG.

Note there is an open PHP bug with this specific scenario (proc_get_status + zombie process): bug 69014, but it poorly documented and is possibly related to FD_SETSIZE, which can be different accross GNU/Linux distributions and this issue depends of the runtime environment (the number of open files).


Resolution paths:

Possibly a full resolution would be the one Ascarion says: "passthru() instead of stream_select()", but I don’t have enough knowledge about processes management and file descriptors to be sure it will work (although it could be only executed if stream_select returns false).

A partial resolution (and a first step to the Ascarion’s full resolution is to add a timeout in stream_select: as said in PHP documentation, the timeout can be null (the current value) to indefinitely block until a file descriptors has some data, or an integer which is the maximum time spent in this function before it returns (this timeout can be 0 but the PHP documentation advices against this (although it is logical to set this to 0 when the process terminated, see code)).

I tried a timeout of 1 second and it works as expected: the execution hangs 1 second and continues when the timeout is reached, which is "better" but still to be improved: it hangs no more but select_stream is still false because of the PHP recompilation error (and in the PHPDBG case the test is always an error because of trigger_error).

I propose a Gerrit patch with a timeout of 1 second as a first step to solve this bug. I have no idea if 1 second is a sensible value. Anyway this function is in a loop and is only here to stop the waiting a file descriptor has some data instead of looping and uselessly use CPU.

Change 347417 had a related patch set uploaded (by Seb35):
[mediawiki/core@master] Add a timeout in wfShellExec

https://gerrit.wikimedia.org/r/347417

I added a test capturing this specific bug. This occurs in the specific case:

  • with PHP (5.6, 7.0, 7.1) but not with HHVM (possibly HHVM has a higher limit FD_SETSIZE or no limit?), and
  • on Linux (at least I didn’t test on Windows or Mac, I don’t have these platforms), and
  • when ulimit -Sn (=limit nofile) is higher than 1024, and
  • when more than 1024 (=FD_SETSIZE) and less than nofile (*) files are simultaneously opened, which can occur on a heavily-loaded server or when executing unit tests with PHPDBG since it opens a lot of (all?) PHP files (1660 in my case).

(*) if more than nofile files are opened, Linux terminates the PHP process due to Linux security limits.

As said above the proposed patch only reduces the damages in this case: the specific command executed by wfShellExec will always fail, but at least it will terminate instead of an indefinite waiting.