Page MenuHomePhabricator

Add/reserve a Jenkins node for the pipeline's trigger jobs
Closed, ResolvedPublic

Description

Due to limitations with Zuul v2's scheduler, it can only only run freestyle Jenkins jobs. To work around this limitation, all pipeline jobs currently have a corresponding freestyle "trigger" job that passes along parameters and blocks, returning the result of the "real" pipeline job.

These trigger builds end up taking executors unnecessarily; They don't do any real work.

One solution for this issue might be to add a single node to Jenkins with a label (e.g. "trigger") that can be used to schedule trigger builds, keeping the other nodes free to do real work. This node might not even necessitate an additional instance as it won't incur substantial CPU usage. It could potentially be configured as a remote SSH node that logs in to an existing instance with a different user account for example.

Event Timeline

thcipriani triaged this task as Medium priority.
thcipriani subscribed.

@brennen made the mistake of showing interest in this task, assigning accordingly :)

Mentioned in SAL (#wikimedia-releng) [2019-06-11T21:07:50Z] <brennen> creating integration-trigger-01 for T224069

Change 516723 had a related patch set uploaded (by Brennen Bearnes; owner: Brennen Bearnes):
[integration/config@master] pipeline: Add node for pipeline's trigger jobs

https://gerrit.wikimedia.org/r/516723

Change 516723 merged by jenkins-bot:
[integration/config@master] pipeline: Add node for pipeline's trigger jobs

https://gerrit.wikimedia.org/r/516723

hashar subscribed.

The instance is a small one and all most of the disk is used by the / partition. When the extended disk is created, there is only 415MBytes left for `/srv:

$ df -h /srv
Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/vd-second--local--disk  484M   41M  415M   9% /srv

Thus Shinken to alert about disk space :-\

With the puppet class role::ci::slave::labs applied, we got: role::ci::slave::labs::common > ::profile::labs::lvm::srv which does the extended disk.

My fault really, the role::ci::slave::labs class is legacy from when we had the instances provisioned with everything needed to run any job (php, debian packages, python, node etc). It is way too large.

Instead we would want to create a new role in puppet that is shrinken and just include jenkins::common. That should be sufficient. The home directory would be /mnt/home/jenkins-deploy which is not even a mount but on the / partition, that comes from LDAP and never got changed. The path has to be set in the Jenkins config for the slave.

So my bad really. You got hit by lot of Technical-Debt

Just so I'm clear, in this instance is it sufficient to change workspace to /mnt/home/jenkins-deploy in the Jenkins config? I see that path exists.

Just so I'm clear, in this instance is it sufficient to change workspace to /mnt/home/jenkins-deploy in the Jenkins config? I see that path exists.

Yes that would do it.

Then we need to use a puppet class that does not bring in ::profile::labs::lvm::srv which would require the creation of a new dummy role. Then one can change the role applied to the instance and unmount /srv/ \o/

Mentioned in SAL (#wikimedia-releng) [2019-06-13T17:05:28Z] <brennen> changing jenkins remote root for integration-trigger-01 to /mnt/home/jenkins-deploy per T224069

Per conversation elsewhere, will pair with @thcipriani on the puppet role on Friday.

Change 517111 had a related patch set uploaded (by Brennen Bearnes; owner: Brennen Bearnes):
[operations/puppet@production] CI: Create lightweight agent role for Jenkins

https://gerrit.wikimedia.org/r/517111

Change 517111 merged by Alexandros Kosiaris:
[operations/puppet@production] CI: Create lightweight agent role for Jenkins

https://gerrit.wikimedia.org/r/517111

I have unmounted /srv and removed it from /etc/fstab :)