Page MenuHomePhabricator

Speed up catalyst environment checkouts with worktrees
Closed, ResolvedPublic2 Estimated Story Points

Description

I realized legacy patchdemo environments were using git worktree add to checkout repos whereas catalyst is using reference clones.

In testing worktree is faster.

$ cat worktree-checkout.sh
#!/bin/bash
git -C /srv/git/cache/mediawiki/core fetch origin master
git -C /srv/git/cache/mediawiki/core worktree add --detach "$@" HEAD
$ time ./worktree-checkout.sh /tmp/worktree2
From https://gerrit.wikimedia.org/r/mediawiki/core
 * branch                      master     -> FETCH_HEAD
Preparing worktree (detached HEAD b88d5470b27e)
HEAD is now at b88d5470b27e Merge "SpecialBlock: Scroll to erroneous field rather than just the error"

real	0m1.938s
user	0m1.230s
sys	0m0.385s
$ cat reference-checkout.sh
#!/bin/bash
git -C /srv/git/cache/mediawiki/core fetch origin master
git clone --reference /srv/git/cache/mediawiki/core --depth=1 https://gerrit.wikimedia.org/r/mediawiki/core.git "$@"
$ time ./reference-checkout.sh /tmp/reference2
From https://gerrit.wikimedia.org/r/mediawiki/core
 * branch                      master     -> FETCH_HEAD
Cloning into '/tmp/reference2'...
remote: Total 0 (delta 0), reused 0 (delta 0)

real	0m8.273s
user	0m5.761s
sys	0m1.028s

Details

TitleReferenceAuthorSource BranchDest Branch
do not chown the pool repos to user 33 (`www-data`)repos/test-platform/catalyst/wiki-repos-pool!8jnucheT384377main
migrate default repo cache configuration for the `mediawiki` chartrepos/test-platform/catalyst/catalyst-api!87jnucheT384377main
remove repository pool path from calls to Catalyst APIrepos/test-platform/catalyst/patchdemo!101jnucheT384377main
use `git worktree` to create checkouts of the MW reposrepos/test-platform/catalyst/ci-charts!44jnucheT384377main
Customize query in GitLab

Event Timeline

thcipriani edited projects, added Catalyst (Kiwen); removed Catalyst.
thcipriani set the point value for this task to 2.Feb 3 2025, 5:38 PM

I've created a draft implementation for this: https://gitlab.wikimedia.org/repos/test-platform/catalyst/ci-charts/-/merge_requests/44. Using it locally I can see a clear improvement in env time provisioning. For example, for envs created using the Wikipedia preset:

  • With ref clones: 6m 27s
  • With workspaces: 4m 41s

Unfortunately, the solution above requires mounting the repo pool in write mode, since the medatada about workspaces gets stored in the pool repos. This is less than ideal, as it breaks isolation between the components using the pool. In particular the ci-charts/mediawiki chart could add modifications to repos in the pool that would affect other Catalyst charts, as well as Patchdemo. This may still be an acceptable solution though, because we control ci-charts/mediawiki.

I can think of three immediate options to move forward:

  1. We go with the solution in my draft, i.e. we allow ci-charts/mediawiki to modify the pool:
    • PROS: Simple. Not dangerous as long as we remember what the chart is doing
    • CONS: Flawed design. Introduces a more-or-less-hidden coupling between charts and clients of the pool (e.g. Patchdemo) that might come back to bite us down the road
  2. We create a separate, dedicated repo pool for ci-charts/mediawiki. This new pool could be created and managed by Catalyst and based on the existing pool
    • PROS: Properly isolates all charts used by Catalyst and other clients of the pool
    • CONS: Not very hard to implement, but it's added complexity nonetheless. Envs created with ci-charts/mediawiki can theoretically still affect each other
  3. We do nothing. We ignore the cloning times and focus our efforts on improving other parts of the provisioning process, i.e. npm and composer
    • PROS: Done! :)
    • CONS: Cloning times still sad

After some discussion, we decided to go with option 1.: We'll use worktrees.

The idea seemed reasonable to everyone and nobody was particularly concerned about accidental modifications to the pool in the future

Deployed to prod.

As a manual step, I chown'd all repos in the pool back to root to reflect the change from https://gitlab.wikimedia.org/repos/test-platform/catalyst/wiki-repos-pool/-/merge_requests/8

I verified both legacy and Catalyst envs can still be created successfully.

Two new Catalyst envs with the Wikimedia preset clocked in at 4m 29s and 4m 58s:

image.png (64×1 px, 23 KB)

Before applying the new changes, the times for this kind of env (Wikimedia preset) showed a big variance. With the fastest older env (b7e51dec29) taking 4m 49s and the slowest (5648f3da62) 7m 16s

We will need to wait and see more envs created in Catalyst to gauge what the actual improvement is. Testing locally on my machine I saw a time improvement of around 1/3

@thcipriani and I briefly talked about this task.

  • A clone by reference sets a .git/objects/info/alternates that contains the path to the directory holding objects, it then retrieve this objects to create the files.
  • Worktree has the git directory under the original git repo (under its .git/worktrees/hotfix), files are copied from it to create the files.

So essentially they should be as fast since, beside the few .git files copied, they are doing essentially the same operation: create directories, create the files, done.

In both cases the cache is updated from Gerrit, but git clone does a second fetch to get the live objects. On my setup the fetch takes some seconds and the difference is thus noticeable. I think that explains the bulk of it.

Another thing I have noticed with strace -f -c is that worktree does less system calls 68774 versus 77142 for a git clone by reference. Most of the extra ones are a lot more newfstatat (24k > 32k) and more read (210 > 1975). So I guess if the disk I/O are slow (as is the case on WMCS), that adds to the slowness.

Regardless git-worktree looks nice, and is faster in the end. Well done!