Page MenuHomePhabricator

Reduce size of artifacts stored on the CI Jenkins master
Closed, ResolvedPublic

Description

There are a few jobs with a few giga bytes of artifacts. The total is ~ 220GB on contint1001. We should reduce it.

$ sort -nr jenkins-build-sizes.txt |head -n 10
221978	/srv/jenkins/builds
63193	/srv/jenkins/builds/mediawiki-fresnel-patch-docker
33194	/srv/jenkins/builds/mwcore-codehealth-patch
13786	/srv/jenkins/builds/wmf-quibble-core-vendor-mysql-php72-docker
11116	/srv/jenkins/builds/wmf-quibble-vendor-mysql-php72-docker
10292	/srv/jenkins/builds/mwcore-phpunit-coverage-master
10255	/srv/jenkins/builds/wmf-quibble-selenium-php72-docker
9340	/srv/jenkins/builds/mediawiki-quibble-vendor-mysql-php72-docker
8515	/srv/jenkins/builds/quibble-vendor-mysql-php72-docker
7990	/srv/jenkins/builds/mwcore-codehealth-master-non-voting

I had previously added a patch to compress the MediaWiki Junit files after they have been processed by the Jenkins Junit plugin. Possibly we could just delete them instead since Jenkins aggregates them in a single one.

  • mwcore-codehealth-patch runs the SonarQube Scanner with debug logging (-X).
  • mediawiki-fresnel-patch-docker has a performance json trace file for each test. It is an uncompressed file so that users can drag'n drop it to the Chromium performance tab.
  • Quibble jobs having large rawSeleniumVideoGrabs directories from wdio-video-reporter

Event Timeline

Change 585600 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] jjb: disable SonarQube debug output

https://gerrit.wikimedia.org/r/585600

hashar triaged this task as Medium priority.Apr 2 2020, 8:27 PM
hashar updated the task description. (Show Details)

Change 585600 merged by jenkins-bot:
[integration/config@master] jjb: disable SonarQube debug output

https://gerrit.wikimedia.org/r/585600

For Fresnel we might be able to use an Apache tweak similar to the one we used for the performance flame graphs (Xenon):

<Directory /srv/xenon>
     AddType image/svg+xml svg svgz
     AddEncoding gzip svgz
</Directory>

But Jenkins has its own web server served through Apache mod_proxy. The AddType / AddEncoding directive are not available in such context.

The codehealth jobs no more run with SonarQube debug output. That dramatically shrinks the ouput sent to the console.

Left to do is compressing the Chrome performance traces generated by mediawiki-fresnel-patch-docker. But that should happen after the CI machines have been upgraded to Buster.

hashar changed the task status from Open to Stalled.May 18 2020, 2:51 PM

Will revisit after the migration to Buster.

Change 598970 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] jjb: keep less builds for some high traffic jobs

https://gerrit.wikimedia.org/r/598970

Change 598970 merged by jenkins-bot:
[integration/config@master] jjb: keep less builds for some high traffic jobs

https://gerrit.wikimedia.org/r/598970

Change 675515 had a related patch set uploaded (by Hashar; author: Hashar):
[operations/puppet@production] contint: serve compressed json as application/json

https://gerrit.wikimedia.org/r/675515

hashar changed the task status from Stalled to Open.Mar 29 2021, 2:20 PM

I have live hacked it to add to .json.gz files the headers:

Content-Type: application/json
Content-Encoding: gzip

And confirmed I can drag'n drop a trace--trace.json.gz file to Chromium performance tab. So that is just pending deployment now :-]

Change 675515 merged by Dzahn:

[operations/puppet@production] contint: serve compressed json as application/json

https://gerrit.wikimedia.org/r/675515

Change 676118 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] contint: fix syntax in erb template for jenkins proxy

https://gerrit.wikimedia.org/r/676118

Change 676118 merged by Dzahn:

[operations/puppet@production] contint: fix syntax in erb template for jenkins proxy

https://gerrit.wikimedia.org/r/676118

Change 676291 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: compress fresnel browser performance reports

https://gerrit.wikimedia.org/r/676291

Mentioned in SAL (#wikimedia-operations) [2021-04-01T09:01:55Z] <hashar> contint2001: compressing all fresnel trace--trace.json files: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -exec gzip {} \+ # T249268

Change 676291 merged by jenkins-bot:

[integration/config@master] jjb: compress fresnel browser performance reports

https://gerrit.wikimedia.org/r/676291

$ du -m -d0 /srv/jenkins/builds/mediawiki-fresnel-patch-docker
6193	/srv/jenkins/builds/mediawiki-fresnel-patch-docker

The previous capture listed that job consuming 60GB, it is down to 6GB.

New breakdown:

185154	.
71274	./wmf-quibble-selenium-php72-docker
11796	./wmf-quibble-core-vendor-mysql-php72-docker
11608	./quibble-vendor-mysql-php72-noselenium-docker
9197	./wmf-quibble-vendor-mysql-php72-docker
9051	./mediawiki-quibble-vendor-mysql-php72-docker
8609	./mwcore-phpunit-coverage-master
8581	./quibble-vendor-mysql-php72-selenium-docker
6193	./mediawiki-fresnel-patch-docker
3568	./quibble-vendor-mysql-php74-noselenium-docker
3407	./quibble-vendor-mysql-php73-noselenium-docker
2823	./integration-quibble-fullrun
2405	./mediawiki-quibble-composer-mysql-php72-docker
2261	./mediawiki-quibble-vendor-mysql-php72_buster-docker
2162	./mediawiki-quibble-vendor-postgres-php72-docker

The big consumers are now Selenium video and some screenshots in a rawSeleniumVideoGrabs directory :\ Apparently originating from https://www.npmjs.com/package/wdio-video-reporter

Change 676298 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: delete MediaWiki wdio dir rawSeleniumVideoGrabs

https://gerrit.wikimedia.org/r/676298

Change 676298 merged by jenkins-bot:

[integration/config@master] jjb: delete MediaWiki wdio dir rawSeleniumVideoGrabs

https://gerrit.wikimedia.org/r/676298

Effect of deleting the rawSeleniumVideoGrabs:

- 71274	./wmf-quibble-selenium-php72-docker
+ 24301	wmf-quibble-selenium-php72-docker

I think that is good enough now.

https://gerrit.wikimedia.org/r/676291
Did this change make anything wrong with mediawiki-fresnel-patch-docker?
See: https://integration.wikimedia.org/ci/job/mediawiki-fresnel-patch-docker/37231/consoleFull

15:02:28 + find log/ -name '*trace.json' -exec gzip '{}' +
15:02:28 gzip: log/fresnel_records/before/scenario-View recent changes-run-3/trace--trace.json.gz: Permission denied
15:02:28 gzip: log/fresnel_records/before/scenario-View recent changes-run-5/trace--trace.json.gz: Permission denied
15:02:28 gzip: log/fresnel_records/before/scenario-View history of a page-run-3/trace--trace.json.gz: Permission denied
...
15:02:28 gzip: log/fresnel_records/after/scenario-Read a page-run-6/trace--trace.json.gz: Permission denied
15:02:28 gzip: log/fresnel_records/after/scenario-Read a page-run-4/trace--trace.json.gz: Permission denied
15:02:28 Build step 'Execute scripts' changed build result to FAILURE
15:02:28 Build step 'Execute scripts' marked build as failure
Krinkle raised the priority of this task from Medium to High.
Krinkle subscribed.

Indeed. Recent example also at build 37234 for https://gerrit.wikimedia.org/r/c/mediawiki/core/+/676717:

11:53:02 [PostBuildScript] - [INFO] Executing post build scripts.
11:53:02 [mediawiki-fresnel-patch-docker] $ /bin/bash -xe /tmp/jenkins2312077536767341619.sh
11:53:02 + find log/ -name 'mw-debug-*.log' -exec gzip '{}' +
11:53:02 [PostBuildScript] - [INFO] Executing post build scripts.
11:53:02 [mediawiki-fresnel-patch-docker] $ /bin/bash -xe /tmp/jenkins8682391239214874280.sh
11:53:02 + find log/ -name '*trace.json' -exec gzip '{}' +
11:53:02 gzip: log/fresnel_records/before/scenario-Read a page-run-6/trace--trace.json.gz: Permission denied
11:53:02 gzip: log/fresnel_records/before/scenario-Load the editor-run-6/trace--trace.json.gz: Permission denied
…
11:53:02 gzip: log/fresnel_records/after/scenario-Load the editor-run-2/trace--trace.json.gz: Permission denied
11:53:02 gzip: log/fresnel_records/after/scenario-View recent changes-run-1/trace--trace.json.gz: Permission denied
11:53:03 Build step 'Execute scripts' changed build result to FAILURE
11:53:03 Build step 'Execute scripts' marked build as failure

Change 677715 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] Revert "jjb: compress fresnel browser performance reports"

https://gerrit.wikimedia.org/r/677715

Change 677715 merged by jenkins-bot:

[integration/config@master] Revert "jjb: compress fresnel browser performance reports"

https://gerrit.wikimedia.org/r/677715

I thought I had tested it before approving the change, but clearly I did not. I have reverted the change and update the job ( https://integration.wikimedia.org/ci/job/mediawiki-fresnel-patch-docker/ ).

The reason for the breakage is that directories and files under log/ are generated inside a Docker container and belong to nobody:nogroup when the publisher I have added executes gzip as the jenkins-deploy user which thus can't write the .gz. I guess I just assumed the directories permissions were wide open and would allow the user write access.

The fix is to redo the patch and run the find -exec gzip in a container as well.

@hashar: Are there any problematic backend changes? The mediawiki-fresnel-patch-docker didn't work properly still.

13:12:57 INFO:quibble.commands:<<< Finish: User commands: mediawiki-fresnel-patch, in 52.834 s
13:12:57 INFO:backend.ChromeWebDriver:Terminating ChromeWebDriver
13:12:57 INFO:backend.PhpWebserver:Terminating PhpWebserver
13:12:57 INFO:backend.MySQL:Terminating MySQL
13:12:59 Traceback (most recent call last):
13:12:59   File "/usr/local/bin/quibble", line 11, in <module>
13:12:59     load_entry_point('quibble==0.0.46', 'console_scripts', 'quibble')()
13:12:59   File "/usr/local/lib/python3.5/dist-packages/quibble/cmd.py", line 639, in main
13:12:59     cmd.execute(plan, dry_run=args.dry_run)
13:12:59   File "/usr/local/lib/python3.5/dist-packages/quibble/cmd.py", line 402, in execute
13:12:59     quibble.commands.execute_command(command)
13:12:59   File "/usr/local/lib/python3.5/dist-packages/quibble/commands.py", line 22, in execute_command
13:12:59     command.execute()
13:12:59   File "/usr/local/lib/python3.5/dist-packages/quibble/commands.py", line 838, in execute
13:12:59     subprocess.check_call(cmd, shell=True, cwd=self.mw_install_path)
13:12:59   File "/usr/lib/python3.5/subprocess.py", line 271, in check_call
13:12:59     raise CalledProcessError(retcode, cmd)
13:12:59 subprocess.CalledProcessError: Command 'mediawiki-fresnel-patch' returned non-zero exit status 1
13:13:00 Build step 'Execute shell' marked build as failure

Change 683045 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: compress fresnel browser performance reports [2]

https://gerrit.wikimedia.org/r/683045

Change 683045 merged by jenkins-bot:

[integration/config@master] jjb: compress fresnel browser performance reports [2]

https://gerrit.wikimedia.org/r/683045

Change 683051 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: properly delete rawSeleniumVideoGrabs dir

https://gerrit.wikimedia.org/r/683051

Change 683051 merged by jenkins-bot:

[integration/config@master] jjb: properly delete rawSeleniumVideoGrabs dir

https://gerrit.wikimedia.org/r/683051

And that should be good now! :]

Mentioned in SAL (#wikimedia-releng) [2021-04-28T07:19:16Z] <hashar> contint2001: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -exec gzip {} \+ # T249268

Mentioned in SAL (#wikimedia-releng) [2021-04-28T07:26:10Z] <hashar> contint2001: sudo -u jenkins find *quibble* -path '*/archive/log/rawSeleniumVideoGrabs/*' -delete # T249268