Page MenuHomePhabricator

composer-package-php73-docker seems to fail often on Parsoid builds
Closed, ResolvedPublic

Description

Not every time, of course, but quite frequently the composer-package-php73-docker build will fail on Jenkins while the composer-package-php72-docker passes without problems. The issue is often memory related, for instance
https://integration.wikimedia.org/ci/job/composer-package-php72-docker/5462/console has:

00:00:50.222 > PHAN_DISABLE_XDEBUG_WARN=1 phan --allow-polyfill-parser
00:00:50.258 A future major version of Phan will require php-ast 1.0.1+ for AST version 70. php-ast 0.1.6 is installed.
00:00:50.258 (Set PHAN_SUPPRESS_AST_UPGRADE_NOTICE=1 to suppress this message)
00:01:08.910 [PostBuildScript] - Execution post build scripts.

while https://integration.wikimedia.org/ci/job/composer-package-php73-docker/3838/console has:

00:00:25.230 > PHAN_DISABLE_XDEBUG_WARN=1 phan --allow-polyfill-parser
00:00:44.498 
00:00:44.498 mmap() failed: [12] Cannot allocate memory
00:00:44.499 
00:00:44.499 mmap() failed: [12] Cannot allocate memory
00:00:44.499 PHP Fatal error:  Out of memory (allocated 1805651968) (tried to allocate 20480 bytes) in /src/vendor/phan/phan/src/Phan/Language/UnionType.php on line 724
00:00:53.961 Script PHAN_DISABLE_XDEBUG_WARN=1 phan --allow-polyfill-parser handling the phan event returned with error code 255

In other cases, like https://integration.wikimedia.org/ci/job/composer-package-php73-docker/3837/console, the php73 job just hangs.

Event Timeline

hashar subscribed.

The job roams on all CI instances we have, and some are rather small / low memory ones (2GB instances iirc). For composer test it is usually not an issue since almost all repository just run parallel-lint / PHP CodeSniffer. Phan can use a lot of memory, although I am wondering why that would be the case for mediawiki/services/parsoid :(

When running phan from composer install && composer test, the source repository is enhanced with the composer dependencies as well as the development dependencies. So Phan ends up processing its own source code as well as PHPUnit or PHP CodeSniffer among others :-(

For MediaWiki, we have a container that has Phan and its dependencies installed in a different directory, and if I remember correctly we ask composer to not install dev dependencies. This way it more closely matches a deployment. So I guess we will want to revisit how phan is being installed and run.

Change 513168 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Use more explicit jobs for Parsoid service

https://gerrit.wikimedia.org/r/513168

Change 513171 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Run repo specific composer jobs with more memory

https://gerrit.wikimedia.org/r/513171

I am in holidays time for the next few days but the patches should fix it:

https://gerrit.wikimedia.org/r/513168 Use more explicit jobs for Parsoid service
That is to be able to finely tweak the composer test jobs that are being run for Parsoid. The change just rename the jobs.

https://gerrit.wikimedia.org/r/#/c/integration/config/+/513170 Remove unused mwgate composer-package jobs
More or less related. Just clean up things ;)

https://gerrit.wikimedia.org/r/#/c/integration/config/+/513171 Run repo specific composer jobs with more memory
Migrate the more specific jobs introduced in 513168 to CI slaves having more memory (m4executor).

Those should do it, people in release engineering should be able to deploy / watch the aftermath of those changes.

Then we will need to find a better solution to run Phan on parsoid.

We could probably also tweak the .phan/config.php to exclude more stuff in vendor/, including phan itself.

Change 513359 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Phan improvements: analyze some code in tests/, exclude some code in vendor/

https://gerrit.wikimedia.org/r/513359

Change 513359 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Phan improvements: analyze some code in tests/, exclude some code in vendor/

https://gerrit.wikimedia.org/r/513359

Change 513168 merged by jenkins-bot:
[integration/config@master] Use more explicit jobs for Parsoid service

https://gerrit.wikimedia.org/r/513168

Change 513171 merged by jenkins-bot:
[integration/config@master] Run repo specific composer jobs with more memory

https://gerrit.wikimedia.org/r/513171

This is now *mostly* fixed. Getting spurious failures on other jobs, eg ENOMEM on https://integration.wikimedia.org/ci/job/parsoidsvc-npm-run-roundtrip-node-6-docker/5677/console

I wonder if we can play the same trick to assign those jobs to m4executor nodes as well?

greg triaged this task as Medium priority.Jun 3 2019, 4:52 PM

Mentioned in SAL (#wikimedia-releng) [2019-06-04T16:10:37Z] <hashar> Deleting integration-slave-docker-1021 and integration-slave-docker-1049 / too small disk (20G partition) and not enough ram (2G) # T221872

Should be good now, I ended up deleting the old Docker slaves that had only 2GB of RAM. Eventually we will get rid of the m4executor label in Jenkins.