Page MenuHomePhabricator

Cache submodules and use --reference to save space
ClosedPublic

Authored by mmodell on Oct 18 2017, 10:33 PM.

Details

Maniphest Tasks
T137124: Scap3 submodule space issues
Reviewers
demon
thcipriani
hashar
dduvall
Group Reviewers
Release-Engineering-Team
Commits
rMSCAccea24641f77: Cache submodules and use --reference to save space
Patch without arc
git checkout -b D826 && curl -L https://phabricator.wikimedia.org/D826?download=true | git apply
Summary

Requires git 2.11, which we should have everywhere.

The new behavior is to caches the submodules in deploy-cache/cache/modules/, then
when cloning to revs/$rev/ we use --recurse-submodules and --reference ../cache/
then git does the magic to make the clone's submodules reuse the cached objects.

Disk usage, using rPHDEP as an example.

Cache modules

$ du -hs cache/.git/modules/
121M    cache/.git/modules/

Checkout in revs/

$ du -hs revs/test/..git/modules
2.6M    revs/test/.git/modules
Test Plan

Currently untested. I'd like to merge this and test in beta.

Diff Detail

Repository
rMSCA Scap
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

mmodell created this revision.Oct 18 2017, 10:33 PM
Restricted Application added a reviewer: Release-Engineering-Team. · View Herald TranscriptOct 18 2017, 10:33 PM
Restricted Application added a project: Release-Engineering-Team. · View Herald Transcript
mmodell added inline comments.Oct 18 2017, 10:36 PM
scap/deploy.py
275

Update submodules in the cache instead of the rev

303–305

This will take care of referencing the objects from cache and the rev/.git/modules will be tiny!

mmodell retitled this revision from WIP: cache submodules and use --reference to save space to Cache submodules and use --reference to save space.Oct 19 2017, 6:10 PM
mmodell edited the test plan for this revision. (Show Details)
demon accepted this revision.Oct 19 2017, 6:15 PM

Probably fine, at least for testing in beta. Nitpick about performance inline.

scap/git.py
323

For repos with a sufficiently high number of submodules, we'd benefit from using --jobs

We do a lot of this "find a sane number of processors to fork to" logic, we should probably have a function in utils for that.

This revision is now accepted and ready to land.Oct 19 2017, 6:15 PM
mmodell updated this revision to Diff 2189.Oct 19 2017, 10:53 PM

Use the cpus_for_jobs function from D828

mmodell updated this revision to Diff 2190.Oct 19 2017, 10:56 PM

one more call to cpus_for_jobs

mmodell marked an inline comment as done.Oct 19 2017, 10:57 PM
This revision was automatically updated to reflect the committed changes.