Page MenuHomePhabricator

Zuul/Jenkins: Investigate caching of build results for MediaWiki testsuite jobs
Closed, ResolvedPublic3 Estimated Story Points

Description

An idea from the RelEng team day at the Dec 2024 offsite: cache the build result of certain MW testsuite jobs based on the working directory state and any other relevant inputs.

We know that in many cases we run the same tests for the same pieces of code. What we don't know is whether the inputs for those tests are materially different from build to build or if we're running the same tests in the same way multiple times.

Details:

  • It takes less than 10 minutes to branch MediaWiki
    • We know this because we create a wmf/next branch nightly—wmf/next is equivalent to the new version we branch each week for the train, except it happens nightly
  • It takes about 30 minutes for the branching job to complete
  • This is due to the 20 minutes spent waiting for tests to complete
  • If these tests are unnecessary, we'd like to avoid running those tests
  • The first step in knowing if those tests are unnecessary is monitoring

Acceptance criteria:

  • Build a plan for computing a cache key based on all relevant inputs for a given MW job (working directory state and perhaps some build parameters?)
  • Stretch: implement cache key computation prior to test runs and track how many hits we get from build to build

Event Timeline

thcipriani triaged this task as Medium priority.
thcipriani set the point value for this task to 3.
dduvall renamed this task from Zuul/Jenkins: Investigate tracking what tests run for a given code change to Zuul/Jenkins: Investigate caching of build results for MediaWiki testsuite jobs.Jan 9 2025, 7:30 PM
dduvall updated the task description. (Show Details)

@hashar, @bd808, @thcipriani and I discussed how to go about computing a cache key this morning. My takeaways from that discussions are:

  • The working directory states are the primary inputs for computing the cache key.
  • If we can find an optimized way to get a key for each working directory from Git, great.
    • look at git archive
    • can we "factor out" zuul merge commits from the local histories
  • If not, we could potentially just checksum all of src (excluding .git subdirectories). This is very computationally expensive, however.

After the meeting, I read up a bit more on Git internals and how working tree state is setup and how it's associated with commits/refs. If I'm understanding correctly, it seems like there might be a simpler way to get the working tree hash for each repo: use git cat-file -p HEAD | awk '$1 == "tree"'.

Since trees, like all Git objects, are content addressable, two refs that result in the same working tree checkout should be referencing the same exact tree. Right? (Serious question.)

I wrote a quick bash script as a proof of concept:

#!/bin/bash

# the `src` directory under the jenkins workspace
src="$1"

# working_tree_hash resolve the `tree` object hash associated with `HEAD` of
# the given cloned repo
working_tree_hash() {
  git -C "$1" cat-file -p HEAD | awk '$1 == "tree" { print $2 }'
}

export -f working_tree_hash

# 1. find all `.git` subdirectories
# 2. sort them
# 3. resolve the `tree` hash associated with `HEAD` for each
# 4. compute and print a single sha256 based on the list of tree hashes
find "$src" -type d -name .git -print0 \
  | sort -z \
  | xargs -0 -I {} bash -c 'working_tree_hash "$@"' _ {} \
  | sha256sum | awk '{ print $1 }'

Does this look right? If so, I will implement it within Quibble.

Since trees, like all Git objects, are content addressable, two refs that result in the same working tree checkout should be referencing the same exact tree. Right? (Serious question.)

oh! That's smart. What you're saying is also my understanding: tree is file/folder names + file modes + file content hash, so if the tree hash is different, then it seems like the inputs to our testing are different. Likewise, it should be the same as long as the same thing is under test. I can't think of any gotchas here.

The tree should do yes, nice finding!

From the doc Dan pointed to, one can get the sha1 of the tree pointed to by a commit using: git rev-parse HEAD^{tree}. I have looked at GitPython and the code in Quibble would be:

>>> import git
>>> repo = git.Repo('.')
>>> repo.tree("HEAD")
<git.Tree "f4fa01bb1bf8e3fc5aae5a62f9b2943b3312dda3">
>>> repo.tree("HEAD").hexsha
'f4fa01bb1bf8e3fc5aae5a62f9b2943b3312dda3'

(though maybe under the hood that might make multiple calls to git which adds some overhead)

The tree should do yes, nice finding!

From the doc Dan pointed to, one can get the sha1 of the tree pointed to by a commit using: git rev-parse HEAD^{tree}. I have looked at GitPython and the code in Quibble would be:

>>> import git
>>> repo = git.Repo('.')
>>> repo.tree("HEAD")
<git.Tree "f4fa01bb1bf8e3fc5aae5a62f9b2943b3312dda3">
>>> repo.tree("HEAD").hexsha
'f4fa01bb1bf8e3fc5aae5a62f9b2943b3312dda3'

(though maybe under the hood that might make multiple calls to git which adds some overhead)

Thanks, @hashar! That is much more clean.

Change #1110875 had a related patch set uploaded (by Dduvall; author: Dduvall):

[integration/quibble@master] Experimental "success cache" support

https://gerrit.wikimedia.org/r/1110875

Change #1111295 had a related patch set uploaded (by Dduvall; author: Dduvall):

[operations/puppet@production] ci: Install memcached for MediaWiki success cache

https://gerrit.wikimedia.org/r/1111295

@hashar pointed out that the XDG_CACHE_HOME directory is only saved during postmerge so that makes it unsuitable for the success cache. We discussed using memcached as a central cache store instead which I'm currently working on.

Change #1111295 merged by Dzahn:

[operations/puppet@production] ci: Install memcached for MediaWiki success cache

https://gerrit.wikimedia.org/r/1111295

Change #1112106 had a related patch set uploaded (by Dduvall; author: Dduvall):

[integration/config@master] jjb: Enable Quibble's success caching in MediaWiki jobs

https://gerrit.wikimedia.org/r/1112106

Change #1112182 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] Add success cache to Quibble fullrun jobs

https://gerrit.wikimedia.org/r/1112182

Change #1112182 merged by jenkins-bot:

[integration/config@master] Add success cache to Quibble fullrun jobs

https://gerrit.wikimedia.org/r/1112182

Change #1110875 merged by jenkins-bot:

[integration/quibble@master] Experimental "success cache" support

https://gerrit.wikimedia.org/r/1110875

Change #1112194 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/quibble@master] release: Quibble 1.12.0

https://gerrit.wikimedia.org/r/1112194

Change #1112194 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.12.0

https://gerrit.wikimedia.org/r/1112194

Mentioned in SAL (#wikimedia-releng) [2025-01-20T09:55:18Z] <hashar> Updating Quibble jobs to enable success cache experiment - T383243

Change #1112106 merged by jenkins-bot:

[integration/config@master] jjb: Enable Quibble's success caching in MediaWiki jobs

https://gerrit.wikimedia.org/r/1112106

Cache is now live. We're monitoring HIT count via scraping jenkins logs and will continue monitoring for at least a week before taking further steps (i.e., skipping jobs based on cache hits)

Promising results from the success cache.

Right now, we're logging in the Jenkins console when the same tests have run on the same code successfully. It seems there are many cases where we're re-running the same test for the same code. Using the cached result could save some time waiting for tests results.

In the Release-Engineering-Team team meeting this week we talked about how to do that.

Our plan is to exit the test as either skipped or successful (there's some research needed her to find the right status) when there's a hit in the success cache.

We talked a bit about the need to re-run tests that have run successfully. For the first iteration, we'll clear the cache manually if that need arises, on the suspicion that there are few times we're interested in re-running tests that passed when the code under test hasn't changed.

Lferreira subscribed.

My two cents: I prefer transparency on calling it as skipped (with some sort of pointer to the cache info/run) than calling it a "success". I've seen this kind of experience in Gradle Build Cache

@hashar Do you know if we can set the Jenkins result as SKIPPED/NOT_BUILT and have Gearman/Zuul honor that? (i.e. Will Zuul fail the pipeline on any non-success status or are there other success-ish statuses?)

I have a change ready for the skip implementation but it will currently result in a success status.

Change #1121689 had a related patch set uploaded (by Dduvall; author: Dduvall):

[integration/quibble@master] Skip execution upon a success cache hit

https://gerrit.wikimedia.org/r/1121689

@hashar Do you know if we can set the Jenkins result as SKIPPED/NOT_BUILT and have Gearman/Zuul honor that? (i.e. Will Zuul fail the pipeline on any non-success status or are there other success-ish statuses?)

I have a change ready for the skip implementation but it will currently result in a success status.

From the protocol documentation at https://gerrit.wikimedia.org/g/integration/zuul/+/refs/heads/patch-queue/debian/jessie-wikimedia/doc/source/launchers.rst#332 :

When the build is complete, it should send a final WORK_DATA packet with the following in JSON format:
result

Either the string 'SUCCESS' if the job succeeded, or any other value that describes the result if the job failed.

Zuul really only considers SUCCESS as a success, anything else is a failure. The result is set by the Gearman plugin from the Jenkins build result which can be either one of SUCCESS, UNSTABLE, FAILURE, NOT_BUILT, ABORTED.

Zuul itself can set other results such as:

  • SKIPPED when it determined some child jobs do not need to be run
  • CANCELED when aborting the jobs before retriggering them with a new set of changes

Both would be considered failures since they are not SUCCESS, though really those states are internal to Zuul (the skipped jobs get skipped because their parent job failed, the canceled jobs will be rerun).

In Jenkins a built can be marked UNSTABLE indicating the build has completed but has failed tests, that help differentiates. Regardless Zuul only recognizes SUCCESS. So if the Jenkins build was set to NOT_BUILT, Zuul would still consider it a failure. We'd want a new state or an extra field indicating the result comes from cache (Bazel does that when running tests), but I'd rather not patch our Zuul.

Do note that if the job usually takes several minutes and end up being a success after just a minute AND the console output as a nice large bold message stating SKIPPED since the same build set and execution environment previously succeeded, then that might be sufficient?

Do note that if the job usually takes several minutes and end up being a success after just a minute AND the console output as a nice large bold message stating SKIPPED since the same build set and execution environment previously succeeded, then that might be sufficient?

+1 I agree that that probably is enough here—let's not let perfect be the enemy of good. In an perfect world, we'd show "SKIPPED," but showing "SUCCESS" and having clear message in the console is good (and a big improvement over waiting on the same work twice :))

Change #1121689 merged by jenkins-bot:

[integration/quibble@master] Skip execution upon a success cache hit

https://gerrit.wikimedia.org/r/1121689

Change #1122915 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/quibble@master] release: Quibble 1.13.0

https://gerrit.wikimedia.org/r/1122915

Change #1122915 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.13.0

https://gerrit.wikimedia.org/r/1122915

Change #1122935 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] jjb: switch jobs to Quibble 1.13.0

https://gerrit.wikimedia.org/r/1122935

Change #1122935 merged by jenkins-bot:

[integration/config@master] jjb: switch jobs to Quibble 1.13.0

https://gerrit.wikimedia.org/r/1122935

Mentioned in SAL (#wikimedia-releng) [2025-02-26T13:20:10Z] <hashar> Updating Quibble jobs to 1.13.0. "Skip execution upon a success cache hit" which would make some jobs to skip tests entirely when a set of commits/image is known to have previously passed # T383243 | dduvall

Attempting a better summary before closing this:

Implementation

From the commit message of the first change to Quibble:

As a final step in the execution plan, compute and store a cache entry that represents a successful build/test of the distinct subject under test, a SHA256 digest of:

  • Data provided by the caller via --success-cache-key-data.
  • The HEAD^{tree} hash of each of the sorted Git repos under the MediaWiki install directory.

Check for a success cache entry immediately following the cloning of all projects by Zuul cloner. Note that for now, however, we will not skip execution when encountering an entry. We are only experimenting to see what the cache hit/miss rates would be.

The entries are stored in a new memcached based cache that can be enabled in Quibble by passing --memcached-server {server}.

The cache is disabled by default and only enabled when both a memcached server are provided and at least one --success-cache-key-data item is given. The caller should determine what data represents the distinct subject under test, e.g. job name, the container image ref/digest used to execute Quibble, and any relevant job parameters.

Results

Note the feature was only partially implemented at first to simply report cache hits/misses to the console. Over the course of two weeks, I ran an ad-hoc Python script (

) to gather metrics so we could verify the cache was working, what the hit/miss rates were, what kind of real compute time savings it would yield, and by extension if the feature would be worth the extra complexity.

I looked mostly at cache hit rates by pipeline, anticipating that the most likely place to see a high cache hit rate would be gate-and-submit, the hypothesis being that changes often go through test just prior to gate-and-submit and that there would be a window of time where the branches of all dependencies wouldn't move. This proved to be only partially correct.

cacheValues
HITMISS
pipeline%n%n
6.25%293.75%30
coverage2.82%3397.18%1136
experimental2.00%298.00%98
gate-and-submit6.40%66493.60%9710
gate-and-submit-1_3938.18%4261.82%68
gate-and-submit-1_4228.17%4071.83%102
gate-and-submit-1_4321.76%4778.24%169
gate-and-submit-wmf28.30%9071.70%228
php100.00%23
test3.01%34696.99%11150
test-1_393.10%496.90%125
test-1_426.58%593.42%71
test-1_430.35%199.65%285
test-wmf12.10%3087.90%218
Grand Total5.28%130694.72%23413

Where we do see high hit rates is in the gate-and-submit pipelines that operate on slower moving (version) branches, a ~ 20-40% range. This is excellent in terms of rates.

The hit rate isn't all that much more impressive in regular ol' gate-and-submit (6.4%) than in test (3.01%). It's very possible that the rate of changes being merged to master branches outpaces the window of time where nothing changes in the overall set of dependencies between a test and gate-and-submit run.

However, looking at total duration of builds where a success cache hit was encountered, seeing this as potential CI time saved, puts things in greater perspective. (Note that the total duration of the builds isn't exactly the time that would be saved with a fully functioning cache, since there will continue to be overhead in the initial Zuul cloning and setup.)

SUM of duration_minscache
pipelineHITMISSGrand Total
6.429866667108.3951167114.rEMFR8249833e221c
coverage68.670716672568.49242637.163117
experimental5.31041666798.83553333104.14595
gate-and-submit4984.568776754.460581739.0292
gate-and-submit-1_39195.8168526.5693167722.rOPUP3861167e62af
gate-and-submit-1_42304.66591009.3438171314.009717
gate-and-submit-1_43455.36928331592.4380332047.807317
gate-and-submit-wmf893.011652302.24323195.25485
php168.5462667168.5462667
test2530.73466795582.6985298113.43318
test-1_3911.68463333606.1368167617.82145
test-1_4247.9012496.0785543.9797
test-1_432.874551391.48671394.36125
test-wmf288.00618332365.75022653.756383
Grand Total9795.044567185571.4749195366.5195
Real time saved (hours)163.2507428

While cache hit rates in gate-and-submit are not super high (again, 6.4%) the potential CI time saved is substantial, ~ 4985 minutes (83 hours) in a two week period.

See

for the raw report data.

Summary

Success caching should save us enough real CI time that's it's worth the extra complexity in Quibble. Builds for backport changes to slow moving version branches are more likely to encounter a success cache hit than builds for changes to master branches.