Page MenuHomePhabricator

SonarCloud build failure due to OOM in running PHPUnit
Closed, ResolvedPublic

Description

To reproduce:

jenkins-bot
Build failed.
https://sonarcloud.io/dashboard?id=mediawiki-core&branch=681339-2&resolved=false : FAILURE in 2m 54s

Component 'mediawiki-core' on branch '681339-2' not found
The requested project does not exist.
Either it has never been analyzed successfully or it has been deleted.
Go back to the homepage

sonarcloud.png (821×1 px, 155 KB)

Event Timeline

zeljkofilipin moved this task from Backlog 🪒 to Deep work 🌊 on the User-zeljkofilipin board.
zeljkofilipin updated the task description. (Show Details)

@Zbyszko do you know what's the problem?

zeljkofilipin raised the priority of this task from High to Needs Triage.May 17 2021, 1:44 PM

Change 692342 had a related patch set uploaded (by Krinkle; author: Krinkle):

[integration/config@master] Disable noise from broken Sonar jobs (keep in experimental)

https://gerrit.wikimedia.org/r/692342

I'm not sure what exactly is going on, because other jobs (e.g. https://gerrit.wikimedia.org/r/c/mediawiki/core/+/690068) pass successfully.

@Daimona wrote at T282142

See for instance https://integration.wikimedia.org/ci/blue/organizations/jenkins/mwcore-codehealth-patch/detail/mwcore-codehealth-patch/36148/pipeline/:

+ php -d extension=pcov.so -d pcov.enabled=1 -d pcov.directory=/workspace/src -d 'pcov.exclude=@(tests|vendor)@' -d pcov.initial.memory=2147483648 -d pcov.initial.files=3000 vendor/bin/phpunit --exclude-group Dump,Broken,ParserFuzz,Stub --coverage-clover /workspace/log/clover.xml --log-junit /workspace/log/junit.xml tests/phpunit/unit

+ for signal in "$@"

+ trap 'kill -$signal $cover_pid; wait $cover_pid' SIGTERM

+ wait 422

PHPUnit 8.5.15 by Sebastian Bergmann and contributors.

PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 2147483648 bytes) in /workspace/src/vendor/phpunit/php-code-coverage/src/Driver/PCOV.php on line 40

I think my estimate in T280167 for the initial memory wasn't quite good. The memory usage for the complete mwcore coverage runs has also increased from ~1.5 GB to 3.43 GB. I'm not sure why, since setting the initial size to 2 GB seemed a good guess. The codehealth job also seems to be running phpunit with a memory limit of .5 GB, which might be a bit low.

A few possible solutions:

  1. Remove all pcov.initial.memory args and let it malloc as needed
  2. Remove pcov.initial.memory for the codehealth job (and also any job that doesn't run the whole phpunit suite, since it won't need 2 GB)
  3. Increase the memory limit for phpunit in the codehealth job

2 and 3 are not mutually exclusive.

Krinkle renamed this task from SonarCloud build failure to SonarCloud build failure due to OOM in running PHPUnit.May 17 2021, 3:00 PM
Krinkle reopened this task as Open.
Krinkle added a project: Code-Health.

Change 692343 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] mediawiki-coverage: Bump memory limit

https://gerrit.wikimedia.org/r/692343

Change 692344 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] quibble-buster-php73-coverage: Bump image version

https://gerrit.wikimedia.org/r/692344

Change 692345 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[integration/config@master] jjb: Switch to newer coverage images

https://gerrit.wikimedia.org/r/692345

Change 692344 abandoned by Jforrester:

[integration/config@master] quibble-buster-php73-coverage: Bump image version

Reason:

Squashed into parent.

https://gerrit.wikimedia.org/r/692344

Change 692343 merged by jenkins-bot:

[integration/config@master] dockerfiles: [quibble-buster-php73-coverage] Bump coverage memory from 2 to 4GiB

https://gerrit.wikimedia.org/r/692343

Mentioned in SAL (#wikimedia-releng) [2021-05-19T23:44:48Z] <James_F> Publishing quibble-buster-php73-coverage:0.0.47-s1 with a 4GiB memory limit for coverage jobs T280669

Change 692345 merged by jenkins-bot:

[integration/config@master] jjb: Switch coverage jobs to image with 4GiB memory limit, up from 2GiB

https://gerrit.wikimedia.org/r/692345

Change 693019 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] dockerfiles: [quibble-buster-php73-coverage] Drop pcov size hints

https://gerrit.wikimedia.org/r/693019

Change 693021 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] jjb: Switch coverage jobs to image with no memory limit

https://gerrit.wikimedia.org/r/693021

Change 693019 merged by jenkins-bot:

[integration/config@master] dockerfiles: [quibble-buster-php73-coverage] Drop pcov size hints

https://gerrit.wikimedia.org/r/693019

Mentioned in SAL (#wikimedia-releng) [2021-05-20T00:14:03Z] <James_F> Publishing quibble-buster-php73-coverage:0.0.47-s2 with no memory limit for coverage jobs T280669

@Daimona wrote at T282142

A few possible solutions:

  1. Remove all pcov.initial.memory args and let it malloc as needed
  2. Remove pcov.initial.memory for the codehealth job (and also any job that doesn't run the whole phpunit suite, since it won't need 2 GB)
  3. Increase the memory limit for phpunit in the codehealth job

2 and 3 are not mutually exclusive.

OK, I tried 3, and it didn't work, so I tried 1, and it seems like things are back to operational now. But boo.

Change 693021 merged by jenkins-bot:

[integration/config@master] jjb: Switch coverage jobs to image with no memory limit

https://gerrit.wikimedia.org/r/693021

Change 692342 abandoned by Krinkle:

[integration/config@master] Disable noise from broken Sonar jobs (keep in experimental)

Reason:

https://gerrit.wikimedia.org/r/692342

kostajh claimed this task.
@Daimona wrote at T282142

A few possible solutions:

  1. Remove all pcov.initial.memory args and let it malloc as needed
  2. Remove pcov.initial.memory for the codehealth job (and also any job that doesn't run the whole phpunit suite, since it won't need 2 GB)
  3. Increase the memory limit for phpunit in the codehealth job

2 and 3 are not mutually exclusive.

OK, I tried 3, and it didn't work, so I tried 1, and it seems like things are back to operational now. But boo.

Thanks @Jdforrester-WMF for merging + deploying, and to @Daimona, @Krinkle & @zeljkofilipin for debugging & reporting on this.