Page MenuHomePhabricator

integration-config-zuul-layout-validate-docker takes too long in Jenkins due to huge output
Closed, ResolvedPublic

Description

The job https://integration.wikimedia.org/ci/job/integration-config-zuul-layout-validate-docker/ takes minutes too run when it used to be half a minute or so.

That is due to https://gerrit.wikimedia.org/r/#/c/integration/config/+/533916/ Apply REL based pipelines.

Locally that takes just a few seconds. I guess the capture of stdout by Jenkins slows it down.

https://integration.wikimedia.org/ci/job/integration-config-zuul-layout-diff-docker/ is affected as well from time to time.

Event Timeline

hashar renamed this task from integration-config-zuul-layout-validate-docker is too long to integration-config-zuul-layout-validate-docker takes too long.Sep 9 2019, 3:03 PM
hashar renamed this task from integration-config-zuul-layout-validate-docker takes too long to integration-config-zuul-layout-validate-docker takes too long in Jenkins due to huge output.
hashar added a project: Jenkins.

+ Jenkins since that might be the actual root cause? :-\

The zuul layout validation has logging set at DEBUG level explicitly. Lame attempt is to just filter out stderr

Change 535566 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Filter out debug spam when validating zuul layout

https://gerrit.wikimedia.org/r/535566

hashar triaged this task as Medium priority.Sep 10 2019, 12:39 PM

Another occurrence is the test to compare the jobs defined in Zuul with the one defined in JJB: Zuul jobs are defined by JJB

00:01:17.207 ======================================================================
00:01:17.208 FAIL: Zuul jobs are defined by JJB
00:01:17.208 ----------------------------------------------------------------------
00:01:17.245 Traceback (most recent call last):
00:01:17.245   File "/src/tests/test_integration.py", line 113, in test_jjb_zuul_jobs
00:01:17.246     self.assertFalse(zuul_server.test_config(jjb_jobs_file.name))
00:01:17.246 AssertionError: True is not false
00:01:17.246 -------------------- >> begin captured stdout << ---------------------
00:01:17.246 FAILURE: Job mediawiki-quibble-api-testing-mysql-php72-docker not defined
00:01:17.247 
00:01:17.247 --------------------- >> end captured stdout << ----------------------
00:01:17.247 -------------------- >> begin captured logging << --------------------
00:01:17.247 zuul.IndependentPipelineManager: INFO: Configured Pipeline Manager test
00:01:17.248 zuul.IndependentPipelineManager: INFO:   Source: <GerritSource connection: gerrit://gerrit>
00:01:17.248 zuul.IndependentPipelineManager: INFO:   Requirements:
<45MBytes of log>

That kind of blocked @holger.knust while working on https://gerrit.wikimedia.org/r/#/c/integration/config/+/535747/ since the root cause was hidden in the spam!

Change 535566 merged by jenkins-bot:
[integration/config@master] Filter out debug spam when validating zuul layout

https://gerrit.wikimedia.org/r/535566

It is way faster now:

integration-config-zuul-layout-validate-docker SUCCESS in 31s
                                                          ^^^

That specific job is faster but the fix is only for the zuul cli. We gotta fix the test suite (see above T232287#5482128)

Change 547286 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Lower timeout for zuul-layout-diff

https://gerrit.wikimedia.org/r/547286

Change 547286 merged by jenkins-bot:
[integration/config@master] Lower timeout for zuul outputting jobs

https://gerrit.wikimedia.org/r/547286

Change 698782 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] (DO NOT SUBMIT) test perf of zuul diff job

https://gerrit.wikimedia.org/r/698782

Change 698787 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] tests: mute zuul output when testing layout

https://gerrit.wikimedia.org/r/698787

So the slow down was most probably related to the Jenkins Collapsible Section plugin which had a lot of regex triggering on every single lines of console output. I have fixed that at some point cause it caused various issues and was not very robust. It is definitely faster now even with a 7.5MB / 70k lines console output.

The issue is the Zuul layout validator sets logging to DEBUG level and thus dump all the logic to stdout. That is fixable by setting the log level to WARNING.

Change 698782 abandoned by Hashar:

[integration/config@master] (DO NOT SUBMIT) test perf of zuul diff job

Reason:

Was for testing https://gerrit.wikimedia.org/r/c/integration/config/ /698787

https://gerrit.wikimedia.org/r/698782

Change 698787 merged by jenkins-bot:

[integration/config@master] tests: mute zuul output when testing layout

https://gerrit.wikimedia.org/r/698787

Summary

  • Jenkins did a massive amount of regex matching on each line of output which caused some slowness
  • The test suite was emitting way too much contextual informations