Page MenuHomePhabricator

(Nodepool) CI is really slow tonight
Closed, ResolvedPublic

Description

See https://integration.wikimedia.org/zuul/

Some of the repo's show they have been waiting ~40mins.

@hashar:

There is indeed 20-40 changes in Zuul pipelines since 17:30 UTC

zuul_changes.png (303×444 px, 30 KB)

Nodepool queries to OpenStack are rate limited at one every six seconds and we have reached that limit for the last 3 hours:

openstack_api_throttling.png (281×444 px, 18 KB)

The pool of instance is rather busy as well:

pool.png (281×444 px, 27 KB)

Looks like lot of changes have been sent.

Event Timeline

Paladox renamed this task from CI is really slow tonight to (Nodepool) CI is really slow tonight.Jan 16 2017, 7:09 PM
Paladox triaged this task as High priority.

One other possibility is we had a lot of changes merged for oojs/ui. On merge that runs oojs-ui-coverage but only one copy of the job can run on the infra. However for each job pending, it seems Jenkins allocate a node which thus is sitting idle/cant process another job. That is just a theory really.

I've been submitting a lot of OOjs UI changes today, and James has been merging a lot of them. Sorry if we overwhelmed the CI. :D

(There were 24 changesets submitted in the last 3 hours, some with multiple patchsets. https://gerrit.wikimedia.org/r/#/q/project:oojs/ui)

hashar claimed this task.

I guess that explains it oojs/ui is an heavy consumer with long running jobs. Taking https://gerrit.wikimedia.org/r/#/c/332344/ as an example given one propose a patch, +2 it and then jobs run after merge we got:

testtime
composer-package-hhvm-trusty44s
composer-package-php55-trusty48s
oojs-ui-npm-node-4-jessie12m 05s
oojs-ui-npm-run-demos-node-4-jessie9m 30s
oojs-ui-npm-run-doc-node-4-jessie8m 35s
gatetime
composer-package-hhvm-trusty41s
composer-package-php55-trusty37s
oojs-ui-npm-node-4-jessie10m 16s
oojs-ui-npm-run-demos-node-4-jessie9m 02s
oojs-ui-npm-run-doc-node-4-jessie5m 55s
post mergetime
oojs-ui-jsduck-publish9m 31s
oojs-ui-doxygen-publish9s
oojs-ui-coverage12m 06s
oojs-ui-demos-publish11m 24s

So if one send a patch and +2 it that is six Jessie instances being busy for 6 minutes or four during 9-10 minutes. It only takes three such patches to consume the whole pool.