Due to the CI incident on August 10th, we had most of the Nodepool jobs moved back to permanent slave in an emergency.
There was apparently with KVM slowness at somepoint and most importantly the quota being incorrect. That caused Nodepool to spam OpenStack every second.
The quota has been fixed and should now be kept in sync properly T143016
The rate has been set from 1 seconds to 10 seconds to stop hammering OpenStack. Should be lowered again since that really slow down the whole Nodepool processing. The reason is each Nodepool interactions with OpenStack is a task added in a Queue, one and only one is processed every rate seconds. Examples of such tasks:
- ListServersTask (cached for 5 seconds)
- ListFloatingIPsTask (cached for 5 seconds)
- DeleteServerTask
- CreateServerTask
- ...
With a rate of 10 it would only be able to spawn 6 instances per minute, with the other tasks enqueued, that would be realistically less than that. Has been done with 7bcff1d06a00ac0311ec0eb1b625b0fb08bfb315 / T113359
Revert patches:
Status | Gerrit change | Summary |
---|---|---|
Done | https://gerrit.wikimedia.org/r/313061 | Bring back npm-node-4 to Nodepool |
Done | https://gerrit.wikimedia.org/r/#/c/306723/ | Revert "Move rake jobs off of nodepool" |
Done | https://gerrit.wikimedia.org/r/#/c/306724/ | Revert "rake: Fix bundle install path" |
Done | https://gerrit.wikimedia.org/r/#/c/306725/ | Revert "Move tox-jessie & co. off of nodepool" |
Done | https://gerrit.wikimedia.org/r/#/c/306726/ | Revert "Move mediawiki-core-phpcs off of nodepool" |
Done | https://gerrit.wikimedia.org/r/#/c/306727/ | Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool" |