Page MenuHomePhabricator

integration-config-zuul-layout-validate-docker takes too long in Jenkins due to huge output
Open, MediumPublic

Description

The job https://integration.wikimedia.org/ci/job/integration-config-zuul-layout-validate-docker/ takes minutes too run when it used to be half a minute or so.

That is due to https://gerrit.wikimedia.org/r/#/c/integration/config/+/533916/ Apply REL based pipelines.

Locally that takes just a few seconds. I guess the capture of stdout by Jenkins slows it down.

https://integration.wikimedia.org/ci/job/integration-config-zuul-layout-diff-docker/ is affected as well from time to time.

Event Timeline

hashar created this task.Sep 8 2019, 7:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 8 2019, 7:57 PM
hashar updated the task description. (Show Details)Sep 8 2019, 7:59 PM
hashar renamed this task from integration-config-zuul-layout-validate-docker is too long to integration-config-zuul-layout-validate-docker takes too long.Sep 9 2019, 3:03 PM
hashar renamed this task from integration-config-zuul-layout-validate-docker takes too long to integration-config-zuul-layout-validate-docker takes too long in Jenkins due to huge output.
hashar added a project: Jenkins.

+ Jenkins since that might be the actual root cause? :-\

The zuul layout validation has logging set at DEBUG level explicitly. Lame attempt is to just filter out stderr

Change 535566 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Filter out debug spam when validating zuul layout

https://gerrit.wikimedia.org/r/535566

hashar triaged this task as Medium priority.Sep 10 2019, 12:39 PM

Another occurrence is the test to compare the jobs defined in Zuul with the one defined in JJB: Zuul jobs are defined by JJB

00:01:17.207 ======================================================================
00:01:17.208 FAIL: Zuul jobs are defined by JJB
00:01:17.208 ----------------------------------------------------------------------
00:01:17.245 Traceback (most recent call last):
00:01:17.245   File "/src/tests/test_integration.py", line 113, in test_jjb_zuul_jobs
00:01:17.246     self.assertFalse(zuul_server.test_config(jjb_jobs_file.name))
00:01:17.246 AssertionError: True is not false
00:01:17.246 -------------------- >> begin captured stdout << ---------------------
00:01:17.246 FAILURE: Job mediawiki-quibble-api-testing-mysql-php72-docker not defined
00:01:17.247 
00:01:17.247 --------------------- >> end captured stdout << ----------------------
00:01:17.247 -------------------- >> begin captured logging << --------------------
00:01:17.247 zuul.IndependentPipelineManager: INFO: Configured Pipeline Manager test
00:01:17.248 zuul.IndependentPipelineManager: INFO:   Source: <GerritSource connection: gerrit://gerrit>
00:01:17.248 zuul.IndependentPipelineManager: INFO:   Requirements:
<45MBytes of log>

That kind of blocked @holger.knust while working on https://gerrit.wikimedia.org/r/#/c/integration/config/+/535747/ since the root cause was hidden in the spam!

Change 535566 merged by jenkins-bot:
[integration/config@master] Filter out debug spam when validating zuul layout

https://gerrit.wikimedia.org/r/535566

It is way faster now:

integration-config-zuul-layout-validate-docker SUCCESS in 31s
                                                          ^^^

Is this Resolved, then?

That specific job is faster but the fix is only for the zuul cli. We gotta fix the test suite (see above T232287#5482128)

Change 547286 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Lower timeout for zuul-layout-diff

https://gerrit.wikimedia.org/r/547286

hashar updated the task description. (Show Details)Oct 30 2019, 7:21 PM

Change 547286 merged by jenkins-bot:
[integration/config@master] Lower timeout for zuul outputting jobs

https://gerrit.wikimedia.org/r/547286