Page MenuHomePhabricator

Phan is stopping for unknown reason in automated checks
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

00:01:08.366 + cd /mediawiki/extensions/Math
00:01:08.366 + export COLUMNS=80
00:01:08.366 + COLUMNS=80
00:01:08.367 + '[' '!' -f .phan/config.php ']'
00:01:08.367 + exec vendor/bin/phan -d . --long-progress-bar --color --require-config-exists
00:01:08.685 Parsing files...
00:01:08.785 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░    54 / 10898 (  0%) 38MB
00:01:08.912 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   108 / 10898 (  1%) 76MB
00:01:09.041 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   162 / 10898 (  2%) 97MB
...

00:01:20.198 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 10854 / 10898 (100%) 2073MB
00:01:20.198 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           10898 / 10898 (100%) 2073MB
00:01:20.326 Analyzing classes...
00:01:22.878 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2412MB
00:01:22.978 Analyzing functions...
00:01:23.024 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2415MB
00:01:23.125 Analyzing methods...
00:01:29.251 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 2555MB
00:01:29.720 Analyzing files...
00:01:46.015 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  54 / 115 ( 69%) 3016MB
00:20:39.252 ░░░░░░░░░░░░░░░░░░░░░░░░░░/run.sh: line 5:     8 Killed                  /run-generic.sh "/mediawiki/$THING_SUBNAME" "$@"
00:20:40.009 Build step 'Execute shell' marked build as failure

What happens?:

What should have happened instead?:

  • automated check fails for unknown reason

Event Timeline

Mainframe98 subscribed.

It didn't so much as stop as it never actually started. If you check the logs, then you will see the following:

FILE: ...ace/src/extensions/Math/tests/phpunit/unit/TexVC/TexUtilTest.php
12:32:44 ----------------------------------------------------------------------
12:32:44 FOUND 3 ERRORS AND 3 WARNINGS AFFECTING 5 LINES
12:32:44 ----------------------------------------------------------------------
12:32:44   5 | WARNING | [x] Use statements are not alphabetically sorted
12:32:44     |         |     (MediaWiki.Classes.UnsortedUseStatements.UnsortedUse)
12:32:44   7 | ERROR   | [x] There must be one blank line after the last USE
12:32:44     |         |     statement; 2 found;
12:32:44     |         |     (PSR2.Namespaces.UseDeclaration.SpaceAfterLastUse)
12:32:44   8 | ERROR   | [x] Multiple empty lines should not exist in a row;
12:32:44     |         |     found 2 consecutive empty lines
12:32:44     |         |     (MediaWiki.WhiteSpace.MultipleEmptyLines.MultipleEmptyLines)
12:32:44  34 | WARNING | [x] Single space expected after opening
12:32:44     |         |     parenthesis
12:32:44     |         |     (MediaWiki.WhiteSpace.SpaceyParenthesis.SingleSpaceAfterOpenParenthesis)
12:32:44  34 | WARNING | [x] Single space expected before closing
12:32:44     |         |     parenthesis
12:32:44     |         |     (MediaWiki.WhiteSpace.SpaceyParenthesis.SingleSpaceBeforeCloseParenthesis)
12:32:44  35 | ERROR   | [x] Expected 1 blank line after function; 0 found
12:32:44     |         |     (Squiz.WhiteSpace.FunctionSpacing.After)
12:32:44 ----------------------------------------------------------------------
12:32:44 PHPCBF CAN FIX THE 6 MARKED SNIFF VIOLATIONS AUTOMATICALLY
12:32:44 ----------------------------------------------------------------------
12:32:44 
12:32:44 Time: 227ms; Memory: 18MB

MediaWiki-Codesniffer runs before phan does and it has detected code style violations. These must be fixed first before phan can run.

(I don't see the file with style violations in the linked patch, so this isn't something that https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Math/+/923597 causes. You can fix this in a new patch by simply running composer fix)

Hello @Mainframe98 you are right, the link pointing to error was pointing to a previous issue, i corrected the link. Sorry for confusion.

The check seems to be stuck or abort here:

12:26:45 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 54 / 115 ( 69%) 3015MB

Looking at the error log the following line seems suspect:

15:21:11 ░░░░░░░░░░░░░░░░░░░░░░░░░░/run.sh: line 5:     8 Killed                  /run-generic.sh "/mediawiki/$THING_SUBNAME" "$@"

It suggests that, according to https://stackoverflow.com/questions/726690/what-killed-my-process-and-why, there is a memory issue. Comparing other runs of mwext-php74-phan-docker, they all finish before hitting the 3GB.

Previously, Parsoid had this issue, (T221872). The solution was to narrow what phan scans (https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/513359/) and increase the job's memory.

Daimona subscribed.

I can reproduce this locally, phan just gets stuck at some point with high memory usage. Can only reproduce with taint-check enabled. Maybe it's trying to merge super-complex array shapes, that has caused similar issues in the past. Annotating some parameters/return values as safe may help, but investigating taint-check issues is very time-consuming...

hashar subscribed.

We had the issue with Phan T219114#5084302 . It triggers a fork() when writing the process bar, then Linux checks whether the fork can use the same amount of memory and if there is not enough bails out:

/srv/phan/vendor/symfony/console/Terminal.php:127 [2] proc_open(): fork failed - Cannot allocate memory

Maybe that is the same issue?, at the time I went with three potential solutions:

  1. Set sysctl vm.overcommit_memory = 1, which is global to the instance and can have side effects.
  2. Set COLUMNS=80 on CI to bypass Symfony\console\Terminal::getWidth() heavy logic
  3. Disable the progress bar when not running interactively. I have not looked at how that is done in Phan.

From a quick glance at T219114, we first disabled the progress bar and then went to set COLUMNS=80 (which at the time caused Phan to not have to fork to stty to retrieve the number of columns).

Does it happen everywhere or "just" on the Math extension?

Adding

/**
 * @param-taint $mhchemParser none
 */

to MhchemStateMachines::__construct fixes this for me locally. Taint-check simply can't cope with the huge $stateMachines array, and it just goes OOM when trying to infer its shape.

13:21:11 ░░░░░░░░░░░░░░░░░░░░░░░░░░/run.sh: line 5:     8 Killed                  /run-generic.sh "/mediawiki/$THING_SUBNAME" "$@"

From the instance dmesg -T:

[Wed Jul 19 13:19:25 2023] php invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), order=0, oom_score_adj=0
...
[Wed Jul 19 13:19:25 2023] Tasks state (memory values in pages):
[Wed Jul 19 13:19:25 2023] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[Wed Jul 19 13:19:25 2023] [4058861] 65534 4058861  6552646  5877371 47513600        0             0 php
...
[Wed Jul 19 13:19:25 2023] Out of memory: Killed process 4058861 (php) total-vm:26210584kB, anon-rss:23509480kB, file-rss:4kB, shmem-rss:0kB, UID:65534 pgtables:46400kB oom_score_adj:0

For reference, the instances have 24G of RAM while the php process had a 22.4gB anon-rss. One can try sending a dummy change to Math to see whether that is related to this change, else I guess the entrypoint in composer.json can be changed to remove the progress bar and maybe that will solves it.

Change 939713 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: limit Phan memory usage to 7G

https://gerrit.wikimedia.org/r/939713

Mentioned in SAL (#wikimedia-releng) [2023-07-19T14:45:15Z] <hashar> Updating all Jenkins phan jobs to use docker run --memory 7G # T342100

Change 939713 merged by jenkins-bot:

[integration/config@master] jjb: limit Phan memory usage to 7G

https://gerrit.wikimedia.org/r/939713

With the docker memory limit of 7G the process still get killed:

14:47:28 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  54 / 115 ( 69%) 3016MB
14:51:04 ░░░░░░░░░░░░░░░░░░░░░░░░░░/run.sh: line 5:     8 Killed                  /run-generic.sh "/mediawiki/$THING_SUBNAME" "$@"
14:51:05 Build step 'Execute shell' marked build as failure

The limit is implemented via a cgroup and the Linux Kernel out of memory manager does kill it at 7G:

[Wed Jul 19 14:49:15 2023] php invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0

[Wed Jul 19 14:49:15 2023] memory: usage 7340032kB, limit 7340032kB, failcnt 116

[Wed Jul 19 14:49:15 2023] Memory cgroup out of memory: Killed process 162507 (php) total-vm:7959572kB, anon-rss:7324080kB, file-rss:27744kB, shmem-rss:0kB, UID:65534 pgtables:14600kB oom_score_adj:0
hashar assigned this task to Daimona.

Adding

/**
 * @param-taint $mhchemParser none
 */

to MhchemStateMachines::__construct fixes this for me locally. Taint-check simply can't cope with the huge $stateMachines array, and it just goes OOM when trying to infer its shape.

And that fixed it (see diff).

Thank you @Daimona for finding the root cause and @Stegmujo for having filed the task!

As a side effect, the Phan jobs are now limited to 7GB of memory, a feature we would probably want to expand but I guess that is a different story.