Scap3 currently makes a clone from the local git cache for each revision that is deployed to a target node. Git is fairly smart about using hardlinks for local clones so they only take space for the checkout and not multiple object copies; however, the way we are cloning submodules to targets is currently making copies of git objects, using an excess amount of space.
We should figure out a smart way to handle submodules so that targets don't keep an excess amount of git objects.
I've run into a weird problem. I'm trying to deploy #ores to sca03 and running into a disk space issue.
Here's the error I get in scap: P4895 "fatal: No space left on device"
**deployment-tin** (ores is 2.5GB):
halfak@deployment-tin:/srv/deployment$ du -hs ores
**sca03** (ores is 11GB):
halfak@deployment-sca03:/srv/deployment$ du -hs *
It looks like all of it is deploy-cache:
halfak@deployment-sca03:/srv/deployment/ores$ du -hs *
Specifically, it's some old versions that remain cached there:
halfak@deployment-sca03:/srv/deployment/ores/deploy-cache/revs$ du -hs *
I need a way to clean up some of these old versions, but I don't have the rights to delete them.
On T157199#3000944 @hashar wrote:
The root cause is that the scap cache clone submodules from the deployment server instead of using the local cache via hardlinks.
Relatively to deployment-sca03 directory `/srv/deployment/ores`...
The repository is cloned from the deployment-servers under `<repo basename>-cache/cache` as a non-bare repo. Stat shows one of the pack files has 6 links to it:
$ stat deploy-cache/cache/.git/objects/pack/pack-75d8fab69e5b2b012f62050dd5d301fe8a87ba17.pack
Size: 91256382 Blocks: 178248 IO Block: 4096 regular file
Device: fe03h/65027d Inode: 792651 Links: 6
So later when scap deploys it just does a local clone to eg:
deploy -> deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76
That one uses hardlink for the main repository. The submodules however are cloned from the deployment server and apparently do not use `--references`:
$ git -C deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/wheels/ remote -v
origin http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/wheels (fetch)
origin http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/wheels (push)
Thus each revision in the cache ends up with each submodules fully cloned. Taking a pack file from the 'wheels' sub repo. Looking for `pack-779867fb188e708fc37de9957c907c9495ee4e4a.pack` and dumping the inode/filepath:
$ find . -name pack-779867fb188e708fc37de9957c907c9495ee4e4a.pack -printf '%i %h\n';
Imho we need to:
* make the cache directories to be bare repositories to save out the space of a workspace checkout
* pass to `git submodule update --init` the `--references` parameter so submodules benefit from the local cache and get hard linked.
All that should probably be made a sub task for scap.