Page MenuHomePhabricator

phan 1.2.6 is OOMing on MediaWiki core
Closed, ResolvedPublicPRODUCTION ERROR

Description

My guess is that the job is sharing a node with something else that uses a lot of memory? IIRC we had optimized this for CPU, not memory...don't really remember.

I did try running the job once manually and watching the memory usage on the server with top but no other job was running on the node, so phan worked just fine and didn't OOM.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle subscribed.

I've seen this several times in the last few days when trying to merge a MediaWiki core commit, causing the merge to be aborted.

/srv/phan/vendor/symfony/console/Terminal.php:127 [2] proc_open(): fork failed - Cannot allocate memory
(Phan 1.2.6 crashed)
#0  phan_error_handler()
#1  proc_open() called at [/srv/phan/vendor/symfony/console/Terminal.php:127]
#2  Symfony\Component\Console\Terminal::getSttyColumns() called at [/srv/phan/vendor/symfony/console/Terminal.php:70]
#3  Symfony\Component\Console\Terminal::initDimensions() called at [/srv/phan/vendor/symfony/console/Terminal.php:32]
#4  Symfony\Component\Console\Terminal->getWidth() called at [/srv/phan/vendor/phan/phan/src/Phan/CLI.php:1405]
#5  Phan\CLI::progress() called at [/srv/phan/vendor/phan/phan/src/Phan/Analysis.php:347]
#6  Phan\Analysis::analyzeFunctions() called at [/srv/phan/vendor/phan/phan/src/Phan/Phan.php:369]
#7  Phan\Phan::finishAnalyzingRemainingStatements() called at [/srv/phan/vendor/phan/phan/src/Phan/Phan.php:301]
#8  Phan\Phan::analyzeFileList() called at [/srv/phan/vendor/phan/phan/src/phan.php:34]
#9  require_once(/srv/phan/vendor/phan/phan/src/phan.php) called at [/srv/phan/vendor/phan/phan/phan:2]
More details:
#1: proc_open() Args: ["stty -a | grep columns", {1:["pipe", "w"], 2:["pipe", "w"]}, null, null, null, {"suppress_errors":true}]
#2: Symfony\Component\Console\Terminal::getSttyColumns()
#3: Symfony\Component\Console\Terminal::initDimensions()
#4: Symfony\Component\Console\Terminal->getWidth()
#5: Phan\CLI::progress() Args: ["method", 0.087568759283016]
#6: Phan\Analysis::analyzeFunctions() Args: [Phan\CodeBase({}), null]
#7: Phan\Phan::finishAnalyzingRemainingStatements() Args: [Phan\CodeBase({}), null, [".phan/stubs/hhvm.php", ".phan/stubs/wikidiff.php", ".phan/stubs/mail.php", ".phan/stubs/password.php", ".phan/stubs/phpunit4.php", ".phan/stubs/excimer.php", "maintenance/7zip.inc", "maintenance/cleanupTable.inc", "maintenance/CodeCleanerGlobalsPass.inc", "maintenance/commandLine.inc", ... 5451 more element(s)], []]
#8: Phan\Phan::analyzeFileList() Args: [Phan\CodeBase({}), Closure]
#9: require_once() Args: ["/srv/phan/vendor/phan/phan/src/phan.php"]
Build step 'Execute shell' marked build as failure

@hashar Perhaps we should make this job non-voting until we figure out what's going on? It keeps aborting merges in mediawiki/core.

Can it be due to the Phan upgrade? Or most probably we started to exceed some memory threshold. I would rather not make the job non-voting it would still run and consume memory on the infrastructure for close to no usefulness.

If it turns out to be an issue, lets just disable the job from mediawiki/core until the memory usage is figured out. Then, give the size of the mediawiki/core we would probably want more memory available on the instances?

Disabling phan isn't an option due to the amount of issues it regularly catches. Worst case we can revert the upgrade but that's going to be tricky....

How can we increase the amount of memory available to phan?

Disabling phan isn't an option due to the amount of issues it regularly catches. Worst case we can revert the upgrade but that's going to be tricky....

When the job keeps falling and is marked as non-voting, it offers little value since most people will just ignore its result anyway. Hence why I recommend to just disable it from the test pipeline.

Worst case we can revert the upgrade but that's going to be tricky....

So that seems to come from the upgrade of Phan on March 23rd with e7186d68bd0e79610b150e92c2235ef496aaf185. It bumps esty/phan via:

-               "mediawiki/mediawiki-phan-config": "0.3.0"
+               "mediawiki/mediawiki-phan-config": "0.5.0"

And has show by the patch, it is not very trivial to revert it?

How can we increase the amount of memory available to phan?

Short of dropping more memory on the instances, we can't :-] That being said, looking at the consoles, they all fail when calling fork(), more specially the process crash because the Linux Kernel think it can not honor the memory allocation for the child process. I think I talked about it with Krinkle on IRC but did not follow up.

The memory allocation is refused by the kernel itself, because it thinks the new process (from fork()) would eventually not have enough memory when in reality most of the memory is shared with its parent process. So unless the child actually marks all the memory private, it should be fine.

Solutions

Allow memory overflow

One fix is to instruct the Linux Kernel to ignore the memory usage overflow. That is done via sysctl vm.overcommit_memory https://www.kernel.org/doc/Documentation/vm/overcommit-accounting which in puppet can be set via:

sysctl::parameters { 'vm.overcommit_memory':
    values => { 'vm.overcommit_memory' => 1 },  # or 2 ??
}

Disable progress report

When looking at the code from the stacktrace Phan runs the analyzes serially. To output the progress bar it needs to know the length of the bar and to do so it gets the terminal width via:

vendor/phan/phan/src/Phan/CLI.php
use Symfony\Component\Console\Terminal;

$columns = (new Terminal())->getWidth();

In the Symfony code:

public function getWidth()
{
    $width = getenv('COLUMNS');
    if (false !== $width) {
        return (int) trim($width);
    }

    if (null === self::$width) {
        self::initDimensions();
    }

    return self::$width ?: 80;
}

So when COLUMNS environment variable is set, the method returns early. I think that environment variable is set by bash and would be updated when the terminal size change (when receiving the SIGWINCH signal). So a quick solution is to set COLUMNS=80 when not running interactively :]

Anyway, the code continue and does a proc_open for stty -a | grep columns.... which is also run in a shell. So that is a lot of overhead just to get the terminal size but there is no good solution in PHP to retrieve the terminal information (php does not have binding for ioctl()).

Eventually the proc_open does a fork of a 2+ GB process, if the instance is on short memory, the kernel reject the fork since the child would not be able to allocate all that memory.

Conclusion

Potential fixes:

  1. Set sysctl vm.overcommit_memory = 1, which is global to the instance and can have side effects.
  1. Set COLUMNS=80 on CI to bypass Symfony\console\Terminal::getWidth() heavy logic
  1. Disable the progress bar when not running interactively. I have not looked at how that is done in Phan.

I recommend the super quick fix is to set the COLUMNS=80 in the job. Then find out how to disable the progress bar entirely (which also slow down the run since each step causes a shell/stty/grep chain of invocation).

Change 501794 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[integration/config@master] Hopefully reduce memory consumption by phan on MediaWiki core

https://gerrit.wikimedia.org/r/501794

Change 501794 merged by jenkins-bot:
[integration/config@master] Hopefully reduce memory consumption by phan on MediaWiki core

https://gerrit.wikimedia.org/r/501794

OK, I haven't seen any OOMs after deploying that change. I also submitted https://github.com/phan/phan/pull/2634 ("Avoid repeatedly calculating the terminal's width") upstream based on hashar's analysis.

FYI, phan 1.2.8 was released just today and it includes Lego's fix for --progress-bar.

And https://gerrit.wikimedia.org/r/501794 changes phan invocation to no more enforce the progress bar.

I am opening this one again since it also affects extensions (since phan analyzes mediawiki/core as well anyway). An example is on
https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Echo/+/508450/

https://integration.wikimedia.org/ci/job/mwext-php70-phan-docker/28120/console

With container docker-registry.wikimedia.org/releng/mediawiki-phan:0.1.11

- Installing mediawiki/mediawiki-phan-config (0.5.0)
- Installing phan/phan (1.2.6): Loading from cache

(Phan 1.2.6 crashed)
00:03:14.065 #1: proc_open() Args: ["stty -a | grep columns", {1:["pipe", "w"], 2:["pipe", "w"]}, null, null, null, {"suppress_errors":true}]

We would need to bump the jobs to use releng/mediawiki-phan:0.1.14

@hashar For what concerns the progress bar, Lego's fix in phan was included in the 1.2.8 release, so updating our phan config will fix it. I sent https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/+/506064/ for that (actually, it could be updated to 1.3.2). Nevertheless, that would require launching libraryupgrader to update mediawiki/phan inside extensions.

@hashar For what concerns the progress bar, Lego's fix in phan was included in the 1.2.8 release, so updating our phan config will fix it. I sent https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/+/506064/ for that (actually, it could be updated to 1.3.2).

Nevertheless, that would require launching libraryupgrader to update mediawiki/phan inside extensions.

https://codesearch.wmflabs.org/search/?q=%22mediawiki%2Fmediawiki-phan-config%22%5C%3A&i=nope&files=&repos=

It's only 143 repos. I could shell-script it if we need it more urgently than waiting for LibraryUpgrader to be working again will allow.

@Jdforrester-WMF Huh, I forgot that LibraryUpgrader is sorta dead right now. Maybe updating the container would be quicker, but given that the phan upgrade will have to be done at some point, I'd suggest going ahead with it. I have updated the mediawiki/phan patch (https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/+/506064/) to require 1.3.2. If @Legoktm could merge it and release 0.5.1, and you could use such a script, that'd be awesome. Thanks!

I just experienced this crash 4 times in a row, when running CI on mw-ext-FileImporter. Looking at the integration-config repo,

./jjb/mediawiki-extensions.yaml:        image: 'docker-registry.wikimedia.org/releng/mediawiki-phan:0.1.11'

Reading this task, I don't immediately see a reason to not upgrade the image. I'll upload a patch for review.

Change 509758 had a related patch set uploaded (by Awight; owner: Awight):
[integration/config@master] Upgrade Phan image used for extension checks

https://gerrit.wikimedia.org/r/509758

I have updated the jobs mwext-php70-phan-docker mwskin-php70-phan-docker php70-phan-docker

Change 509758 merged by jenkins-bot:
[integration/config@master] Upgrade Phan image used for extension checks

https://gerrit.wikimedia.org/r/509758

hashar claimed this task.

releng/mediawiki-phan:0.14.0 no more enforces the phan progress bar (drop of -p parameter in https://gerrit.wikimedia.org/r/#/c/integration/config/+/501794/1/dockerfiles/mediawiki-phan/run-core.sh ).

Change 509814 had a related patch set uploaded (by Awight; owner: Awight):
[integration/config@master] Revert "Upgrade Phan image used for extension checks"

https://gerrit.wikimedia.org/r/509814

Change 509814 merged by jenkins-bot:
[integration/config@master] Revert "Upgrade Phan image used for extension checks"

https://gerrit.wikimedia.org/r/509814

@awight noticed the container is broken sometime so we reverted the upgrade.

A repository with phan/phan@0.8.0 works just fine, but one with phan/phan@1.2.6 eventually fails with:

docker run docker-registry.wikimedia.org/releng/mediawiki-phan:0.1.14 /mediawiki/extensions/Popups -m checkstyle
+ PHAN_VERSION=1.2.6
+ composer require phan/phan:1.2.6
  - Installing phan/phan (1.2.6): Loading from cache

+ exec /srv/phan/vendor/bin/phan -d . -p /mediawiki/extensions/Popups -m checkstyle
Unknown option '-m'
Usage: /srv/phan/vendor/bin/phan [options] [files...]
....

An example is Popups extension on https://gerrit.wikimedia.org/r/#/c/509803/

The working one does:

docker run releng/mediawiki-phan:0.1.11 /mediawiki/extensions/Popups -m checkstyle
+ exec /srv/phan/vendor/bin/phan -d . -p

Note how the extension path is NOT passed to the phan command. Which seems to have been added with 30b2278f33b7b367fc2ab391789143da6ffb545e ( releng/mediawiki-phan:0.1.12 ):

diff --git a/dockerfiles/mediawiki-phan/run.sh b/dockerfiles/mediawiki-phan/run.sh
index 642463d7..d075ea5a 100755
--- a/dockerfiles/mediawiki-phan/run.sh
+++ b/dockerfiles/mediawiki-phan/run.sh
@@ -37,7 +37,7 @@ if [ -f .phan/config.php ]; then
     # new-style phan, using modern paths and newer ast
     export PHP_ARGS='-dextension=ast_101.so'
     install_phan
-    exec /srv/phan/vendor/bin/phan -d . -p
+    exec /srv/phan/vendor/bin/phan -d . -p "$@"
 else
     # old-style, using tests/phan and MW wrapper
     export PHP_ARGS='-dextension=ast_012.so'

And I guess something is slightly different between phan 0.8.0 and 1.2.6, with the later potentially requiring options to show up before a path (thank you @Lucas_Werkmeister_WMDE for pointing that.

mwext-php70-phan-docker does:

- docker-run-with-log-cache-src:
   options: '--volume "$(pwd)"/src:/mediawiki'
   image: 'docker-registry.wikimedia.org/releng/mediawiki-phan:0.1.11'
   args: '/$ZUUL_PROJECT -m checkstyle'  <--------------------
   logdir: '/log'

So we probably just have to flip the args being passed to have the path (/$ZUUL_PROJECT) to show up last?

Should be easily reproducible by attempting to run the container against mediawiki/extensions/Popups.

Change 509797 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[mediawiki/core@master] build: Upgrade mediawiki/mediawiki-phan-config from 0.5.0 to 0.6.0 and make pass

https://gerrit.wikimedia.org/r/509797

Change 510140 had a related patch set uploaded (by Awight; owner: Awight):
[integration/config@master] mediawiki-phan 0.1.15: reduce memory usage

https://gerrit.wikimedia.org/r/510140

Change 510143 had a related patch set uploaded (by Awight; owner: Awight):
[integration/config@master] Use phan images with disabled progress bar

https://gerrit.wikimedia.org/r/510143

Change 510458 had a related patch set uploaded (by Awight; owner: Awight):
[integration/config@master] Fix phan parameter order

https://gerrit.wikimedia.org/r/510458

Change 510459 had a related patch set uploaded (by Awight; owner: Awight):
[integration/config@master] Reduce memory usage by omitting progress bar

https://gerrit.wikimedia.org/r/510459

Change 510458 merged by jenkins-bot:
[integration/config@master] Fix phan parameter order

https://gerrit.wikimedia.org/r/510458

https://gerrit.wikimedia.org/r/510458 apparently broke Wikibase and got reverted. https://integration.wikimedia.org/ci/job/mwext-php70-phan-docker/28936/console

00:02:36.154 + cd /mediawiki/extensions/Wikibase
00:02:36.154 + '[' '!' -f /mediawiki/extensions/Wikibase/.phan/config.php ']'
00:02:36.154 + '[' '!' -f /mediawiki/extensions/Wikibase/tests/phan/config.php ']'
00:02:36.155 + jq -e '.extra."phan"' composer.json
00:02:36.170 null
00:02:36.170 + CFG_COMPOSER=vendor/mediawiki/mediawiki-phan-config/composer.json
00:02:36.171 + jq -e '.extra."phan"' vendor/mediawiki/mediawiki-phan-config/composer.json
00:02:36.177 jq: error: Could not open file vendor/mediawiki/mediawiki-phan-config/composer.json: No such file or directory
00:02:36.177 + PHAN_VERSION=0.8
00:02:36.177 + '[' -f .phan/config.php ']'
00:02:36.178 + export PHP_ARGS=-dextension=ast_012.so
00:02:36.178 + PHP_ARGS=-dextension=ast_012.so
00:02:36.178 + install_phan
00:02:36.178 + cd /srv/phan
00:02:36.178 + composer require phan/phan:0.8
00:02:37.137 ./composer.json has been created
00:02:38.804   - Installing phan/phan (0.8.0): Loading from cache
00:02:39.358 + exec /mediawiki/tests/phan/bin/phan -m checkstyle /mediawiki/extensions/Wikibase
00:02:39.366 Could not find a phan config file to apply in
00:02:39.367 /mediawiki/tests/phan/config.php
00:02:39.748 Build step 'Execute shell' marked build as failure

How about we disable the progress bar while leaving the phan version the same?

It looks like phan 0.8 and 1.2.6 have a forwards- and backwards-incompatible break in commandline syntax, so we need to carefully plan the migration. Options that come to mind:

  • Patch phan so that a newer version is able to handle legacy commandline parameter order.
  • Pass the output-mode argument as an environment variable so it can be injected into the commandline by the run script, which knows which version we're using.

The arguments handling error with -m checkstyle is due to mediawiki/core script tests/phan/bin/phan. It handles the first parameter expecting to sometime be a path:

# If the first argument doesn't start with a -, then it's a path
# to another project (extension, skin, etc.) to analyze
if [[ -n "$1" && "$1" != "-"* ]]; then
    cd $1
    shift
else
    cd "$(dirname "$0")"
fi

So here is the summary of the issue. When we introduced phan, we went with a wrapper shipped in mediawiki/core: tests/phan/bin/phan.

The wrapper defaults to run for mediawiki/core code base (cd "$(dirname "$0")") and for extension/skins one was supposed to pass the directory as the FIRST argument. That is why we have the Jenkins jobs passing: $ZUUL_PROJECT -m checkstyle. To trigger that logic.

When we tried to update to the new container 1.14 the parameter is also passed to the phan command (not the mediawiki/core wrapper above) when previously it was not. So phan get the option after the directory name and complains as one would expect.

What I thought about is to adjust the invocation of the MediaWiki wrapper and inject the path as the argument to force the logic. But there is a chicken and egg problem there when the wrapper get the path added a second time. The command would be:

/mediawiki/tests/phan/bin/phan /mediawiki/$ZUUL_PROJECT -m checkstyle /mediawiki/$ZUUL_PROJECT

Which would run phan from the root of the repository and pass the path again. Which might be just fine.


I think looked at the build history for the phan jobs to check which repositories were still using the old legacy wrapper /mediawiki/tests/phan/bin/phan.

For mwext-php70-phan-docker only Wikibase still uses it. It has not been migrated yet to a newer phan, so that should be a task.

For mediawiki-core-php70-phan-docker the wrapper is only invoked by the release branches. Then it uses a different entry point in the docker container so the command is fine and is:

exec /mediawiki/tests/phan/bin/phan -m checkstyle

Change 509797 merged by jenkins-bot:
[mediawiki/core@master] build: Upgrade mediawiki/mediawiki-phan-config from 0.5.0 to 0.6.0 and make pass

https://gerrit.wikimedia.org/r/509797

Change 510936 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] docker: mediawiki phan wrapper requires a path

https://gerrit.wikimedia.org/r/510936

Oh :-) That's even simpler than I was imagining, since we own the wrapper script. I think your migration will work: we leave jjb/* configuration alone and deploy your run.sh patch, then remove the redundant subpaths from jjb. I was able to verify that the trailing path parameter is harmless.

On the other topic, I still think that disabling the progress bars is an improvement independently of upgrading phan, since the progress display does nothing but add a bunch of cruft to the console logs.

Yeah that sounds like a good migration plan :]

And indeed on Jenkins we do not need any progress to be shown. A further improvement would be to have phan to automatically enable the progress bar when the command is run interactively. But that I digress.

Change 510936 merged by jenkins-bot:
[integration/config@master] docker: mediawiki phan wrapper requires a path

https://gerrit.wikimedia.org/r/510936

Change 510459 merged by jenkins-bot:
[integration/config@master] docker: disable phan progress bar

https://gerrit.wikimedia.org/r/510459

Change 510140 merged by jenkins-bot:
[integration/config@master] Bump mediawiki-phan container

https://gerrit.wikimedia.org/r/510140

Mentioned in SAL (#wikimedia-releng) [2019-05-20T13:11:21Z] <hashar> updating phan jobs to use docker-registry.wikimedia.org/releng/mediawiki-phan:0.1.15 # T219114

Change 510143 merged by jenkins-bot:
[integration/config@master] Use latest phan images

https://gerrit.wikimedia.org/r/510143

@awight update the container to disable the progress bar (== no more pass -p)

I have added a hack to support the legacy MediaWiki core phan wrapper which was breaking the run for Wikibase (which still fall back to phan 0.8.0).

Should be good now!

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM

The progress bar was restored for no apparent reason via https://gerrit.wikimedia.org/r/541001 - is it worth removing the -p switch again?

The progress bar was restored for no apparent reason via https://gerrit.wikimedia.org/r/541001

It was re-enabled (as noted in the commit message) because the performance issue which had led to its removal was then fixed. We had explicitly added COLUMNS to our docker images to avoid shelling out, and it's also been fixed upstream. Hence, the reasons which brought to the removal of the switch do not stand anymore.

  • is it worth removing the -p switch again?

It depends. For sure, it won't make phan run any faster.