Page MenuHomePhabricator

Make puppet-compiler execution run with higher priority, not like other 'experimental' jobs
Open, MediumPublic

Description

Lately I'm seeing puppet compiler queueing delays of 10-20 minutes, blocking the compilation process from beginning -- which can really complicate iterating on a patch during a deployment window.

Can we get operations-puppet-catalog-compiler-puppet7-test to run at test-prio? Thanks!

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
hashar subscribed.

The priorites are set on a per pipeline basis. That job is in the experimental Pipeline (react on check experimental comments) and it is at the lowest priority:

zuul/layout.yaml
pipelines:
  - name: experimental
    precedence: low

I'd keep it low cause it used for a bunch of checks for which we don't need immediate answer (mostly to try / validate things in the MediaWiki system). So essentially the answer is no :-]

Then, the job is tied to the puppet7-compiler-node Jenkins label which is served by three dedicated Puppet compiler hosts. They run nothing else, looking at https://integration.wikimedia.org/ci/label/puppet7-compiler-node/load-statistics it was not saturating.

Zuul emits a bunch of metrics to statds and scrapped by Prometheus which are documented at https://gerrit.wikimedia.org/g/integration/zuul/+/refs/heads/debian/jessie-wikimedia/doc/source/statsd.rst . The mattering measure would be the resident time and wait_time:

#. **wait_time** counter and timer of the wait time, with the difference
         of the job start time and the launch time, in milliseconds.

#. **resident_time** timing representing how long the Change has been
         known by Zuul (which includes build time and Zuul overhead).

And if I look at zuul_pipeline_job_wait_time_seconds{job_name="operations-puppet-catalog-compiler-puppet7-test"} it does not seem to wait ever. But maybe my Prometheus understanding is wrong :-\

If you have a change number + patchset, I can dig in the debug log on contint.wikimedia.org /var/log/zuul/debug.log and see what happened.

I have looked at 1226932/1. The merge jobs ran immediately, the test-prio started at 19:13 so quite rapidly but Zuul only called the experimental one at 19:23. So that is not an overhead in Jenkins but in Zuul itself.

Zuul (well its embedded Gearman server) processes high precedence jobs first, then normal ones and eventually the low priorities one. As long as there are requests being added to the high/normal queues, they keep yielding items and thus the low one is only processed after the high/normal ones have been drained

def getJobForConnection(self, connection, peek=False):
    for queue in [self.high_queue, self.normal_queue, self.low_queue]:
        for job in queue:
            if job.name in connection.functions:
                if not peek:
                    queue.remove(job)
                    connection.related_jobs[job.handle] = job
                    job.worker_connection = connection
                    job.running = True
                    self.waiting_jobs -= 1
                    self.running_jobs += 1
                    self._updateStats()
                return job
    return None

Thus if CI is busy, the low priority items are never processed even though there is a label that could serve them.

I don't have a solution to that problem though we are working on upgrading Zuul which is certainly not affected in the same way (notably it no more uses Gearman).

I guess we could add a dedicated Pipeline for the Puppet Compiler to work around it.

Change #1234514 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] Zuul: Provide a custom, high-priority pipeline just for puppet compiler

https://gerrit.wikimedia.org/r/1234514

Change #1234515 had a related patch set uploaded (by Jforrester; author: Jforrester):

[integration/config@master] [DNM] Zuul: [operations/puppet] Drop experimental pipeline use

https://gerrit.wikimedia.org/r/1234515

Jdforrester-WMF renamed this task from Move puppet-compiler execution to the test-prio queue to Make puppet-compiler execution run with higher priority, not like other 'experimental' jobs.Jan 28 2026, 10:01 PM
LSobanski triaged this task as Medium priority.Feb 23 2026, 3:31 PM

Change #1234514 merged by jenkins-bot:

[integration/config@master] Zuul: Provide a custom, high-priority pipeline just for puppet compiler

https://gerrit.wikimedia.org/r/1234514

Change #1244816 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/software/gerrit@deploy/wmf/stable-3.10] plugins/wm-pcc: Switch commands from experimental to new puppet

https://gerrit.wikimedia.org/r/1244816

Mentioned in SAL (#wikimedia-releng) [2026-02-26T20:16:37Z] <James_F> Zuul: Provide a custom, high-priority pipeline just for puppet compiler T414621