Page MenuHomePhabricator

Speed up oojs/ui Jenkins jobs
Closed, ResolvedPublic

Assigned To
Authored By
hashar
Jan 17 2017, 8:55 AM
Referenced Files
F6591281: npm-run-demos.log
Mar 15 2017, 2:28 PM
F6591358: tasksdiff.html
Mar 15 2017, 2:28 PM
F6591282: npm-run-doc.log
Mar 15 2017, 2:28 PM
F6591283: npm-test.log
Mar 15 2017, 2:28 PM
Tokens
"Doubloon" token, awarded by matmarex."The World Burns" token, awarded by Prtksxna."Love" token, awarded by SamanthaNguyen."Like" token, awarded by Volker_E.

Description

Abstract

The Jenkins jobs triggered for oojs/ui.git takes too much resources on the CI infrastructure, specially when lot of changes are made in a serie and merged.

Spin off of T155444. Taking https://gerrit.wikimedia.org/r/#/c/332344/ as an example given one propose a patch, +2 it and then jobs run after merge we got:

testtime
composer-package-hhvm-trusty44s
composer-package-php55-trusty48s
oojs-ui-npm-node-4-jessie12m 05s
oojs-ui-npm-run-demos-node-4-jessie9m 30s
oojs-ui-npm-run-doc-node-4-jessie8m 35s
gatetime
composer-package-hhvm-trusty41s
composer-package-php55-trusty37s
oojs-ui-npm-node-4-jessie10m 16s
oojs-ui-npm-run-demos-node-4-jessie9m 02s
oojs-ui-npm-run-doc-node-4-jessie5m 55s
post mergetime
oojs-ui-jsduck-publish9m 31s
oojs-ui-doxygen-publish9s
oojs-ui-coverage12m 06s
oojs-ui-demos-publish11m 24s

So if one send a patch and +2 it that is six Jessie instances being busy for 6 minutes or four during 9-10 minutes. It only takes three such patches to consume the whole pool.

A serie of 30 patches got merged on March 14th 2016 between 19:00UTC to midnight and from 17:50UTC to midnight that is 241 jobs and thus 241 instances consumed. That is too much for the CI infrastructure to handle.

Jobs detail

When patches are proposed for the oojs/ui.git repository, we trigger three different jobs that run:

  1. npm test : grunt test
  2. npm demos : grunt publish-build demos
  3. npm doc : grunt build && jsduck && copy:jsduck

All of them share the CI setup overhead to clone the repository. They then run npm install which probably takes half of the build time.

Each also invoke composer install, though it benefits from caching.

Looking at the grunt tasks being run, each job also share common tasks some being rather slow (ex: colorizeSvg, svg2png:dist).

Tasks breakdown

Using the job console output:

vimdiff <(grep 'Running' npm-test.log) <(grep 'Running' npm-run-demos.log) <(grep 'Running' npm-run-doc.log)
:TOhtml

https://people.wikimedia.org/~hashar/T160513/tasksdiff.html

What would be nice

  • I would like all three build steps (test, doc, demos) to be unified in a single Jenkins job. That would clear out the overhead of git clone / npm install.
  • We probably want a new task defining the tasks to run, paying attention to avoid running twice the slowest one (svg2png and colorizeSvg).

We can have Jenkins to invoke a specific npm entry point such as npm run jenkins which would have the appropriate pre/post task and maybe invoke a specific/custom grunt task.

Build logs

Event Timeline

hashar lowered the priority of this task from High to Medium.Jan 17 2017, 8:55 AM
hashar created this task.

Just noting that apparently nodejs 6 improves performance for some not sure if it will have an affect on oojs tests though.

Also +1 to splitting it into one giant test for oojs since that will allow other repos to use the instances too :)

Also maybe if we do a +2 and it gets tested in gate and submit we may want the tests to be cancelled in the test pipeline.

This will also increase more instances available for other tests to use.

@hashar since we updated to nodejs 6 today the test time has gone down by two mins on the npm test to 10mins now :)

See https://gerrit.wikimedia.org/r/#/c/332344/

Incorporate changes from T160513. Specially I have looked at what each jobs and there is a lot of overlapping grunt tasks. Specially svg2png and colorizeSvg. See the updated task description for the whole details.

Slightly related: the Jenkins slaves now have PhantomJS thus the oojs-ui jobs no more have to download it:

Considering PhantomJS found at /usr/bin/phantomjs
Found PhantomJS at /usr/bin/phantomjs ...verifying 
Writing location.js file
PhantomJS is already installed on PATH at /usr/bin/phantomjs

Change 344970 had a related patch set uploaded (by Prtksxna):
[oojs/ui@master] build: Add a new jenkins script

https://gerrit.wikimedia.org/r/344970

Change 344970 merged by jenkins-bot:
[oojs/ui@master] build: Add a new jenkins script

https://gerrit.wikimedia.org/r/344970

The repository now has a 'jenkins' npm script and the test task now depends on demo.

So we can drop the three jobs that run:

  1. npm test : grunt test
  2. npm demos : grunt publish-build demos
  3. npm doc : grunt build && jsduck && copy:jsduck

And replace them with a single job that does npm run jenkins eg: npm test && jsduck && npm run postdoc.

Change 345203 had a related patch set uploaded (by Hashar):
[integration/config@master] Merge 3 oojs/ui jobs in a single one

https://gerrit.wikimedia.org/r/345203

Change 345203 merged by jenkins-bot:
[integration/config@master] Merge 3 oojs/ui jobs in a single one

https://gerrit.wikimedia.org/r/345203

The first run of oojs-ui-npm-run-jenkins-node-6-jessie passed. Thank you Prateek!

Mentioned in SAL (#wikimedia-releng) [2017-03-28T19:53:14Z] <hashar> Populating package manager cache of oojs-ui-npm-run-jenkins-node-6-jessie by manually triggering a build with ZUUL_PIPELINE=postmerge T155483

The next optimization is that the npm test / grunt test commands use composer. So most probably we could merge the couple other jobs that each take less than 1 minute all overhead included:

composer-package-hhvm-jessie
composer-package-php55-trusty

Each job has a different PHP_BIN environment variable to point either to /usr/bin/php5 (on Trusty with Zend 5.5) or /usr/bin/hhvm (on Jessie) and then run:

  • composer update --ansi --no-progress --prefer-dist --profile -v
  • composer --ansi test

Given the new job npm run jenkins runs on Jessie, could we make the Grunt task to also invoke 'composer test'? Grunt exec:demos already invokes composer update.

That will let us merge the job composer-package-hhvm-jessie.

Change 345284 had a related patch set uploaded (by Prtksxna):
[oojs/ui@master] build: Add exec:composer and add it to _ci

https://gerrit.wikimedia.org/r/345284

I tried to make changes in jjb/oojs.yaml and zuul/layout.yaml as well, but couldn't figure out what needed to be done.

Adding the composer test step "only" adds 44 seconds:

00:11:16.835 Running "exec:composer" (exec) task
00:11:17.311 >> [2.0MB/0.01s] Loading composer repositories with package information
00:11:17.748 >> [2.0MB/0.45s] Updating dependencies (including require-dev)
00:11:21.616 >> [66.0MB/4.31s] Dependency resolution completed in 0.391 seconds
00:11:21.625 >> [66.0MB/4.32s] Analyzed 1209 packages to resolve dependencies
00:11:21.625 >> [66.0MB/4.32s] Analyzed 17693 rules to resolve dependencies
00:11:21.625 >> [66.0MB/4.32s] Nothing to install or update
00:11:21.668 >> [66.0MB/4.37s] Dependency resolution completed in 0.002 seconds
00:11:21.692 >> [66.0MB/4.39s] Generating autoload files
00:11:22.752 >> [68.0MB/5.45s] Memory usage: 68MB (peak: 68MB), time: 5.45s
00:11:24.572 >> > parallel-lint . --exclude vendor --exclude demos/vendor
00:11:25.210 PHP 5.6.99 | HHVM 3.12.14 | 10 parallel jobs
00:11:42.842 ............................................................ 60/91 (65 %)
00:11:45.995 ...............................                              91/91 (100 %)
00:11:45.995 
00:11:45.995 
00:11:45.996 Checked 91 files in 11.7 seconds
00:11:45.997 No syntax error found
00:11:46.011 >> > phpcs -p -s
00:11:59.928 ..................................................
00:11:59.928 
00:11:59.928 Time: 13.53 secs; Memory: 4.73Mb
00:11:59.928 
00:11:59.947 >> > phpunit $PHPUNIT_ARGS
00:12:00.564 PHPUnit 4.8.21 by Sebastian Bergmann and contributors.
00:12:00.565 
00:12:00.622 ..........................
00:12:00.622 
00:12:00.622 Time: 438 ms, Memory: 2.93MB
00:12:00.622 
00:12:00.622 OK (26 tests, 36 assertions)
00:12:00.661 
00:12:00.661 Done.

Change 345286 had a related patch set uploaded (by Prtksxna):
[integration/config@master] oojs/ui: Stop using composer-test-package template

https://gerrit.wikimedia.org/r/345286

Change 345286 merged by jenkins-bot:
[integration/config@master] oojs/ui: Stop using composer-test-package template

https://gerrit.wikimedia.org/r/345286

Change 345284 merged by jenkins-bot:
[oojs/ui@master] build: Add exec:composer and add it to _ci

https://gerrit.wikimedia.org/r/345284

Mostly solved. There are still potential optimization to be done notably to cache the node_modules, but that apply to the whole infra and not just oojs/ui.

Thank you @Prtksxna

\o/

@hashar Is it possible to have a number on how many seconds/minutes we are saving per patch after these changes?

Fast forward months later: Nodepool is being discarded so there is less incentive to save up instances and we could run more jobs in parallel. We can surely speed up the test run and I filled T189055 (ironically with almost the exact same name).