Scap3 currently makes a clone from the local git cache for each revision that is deployed to a target node. Git is fairly smart about using hardlinks for local clones so they only take space for the checkout and not multiple object copies; however, the way we are cloning submodules to targets is currently making copies of git objects, using an excess amount of space.
We should figure out a smart way to handle submodules so that targets don't keep an excess amount of git objects.
-----------
@Halfak wrote:
I've run into a weird problem. I'm trying to deploy #ores to sca03 and running into a disk space issue.
Here's the error I get in scap: P4895 "fatal: No space left on device"
**deployment-tin** (ores is 2.5GB):
```
halfak@deployment-tin:/srv/deployment$ du -hs ores
2.5G ores
```
**sca03** (ores is 11GB):
```
halfak@deployment-sca03:/srv/deployment$ du -hs *
580M eventstreams
11G ores
```
It looks like all of it is deploy-cache:
```
halfak@deployment-sca03:/srv/deployment/ores$ du -hs *
0 deploy
11G deploy-cache
8.5M venv
```
Specifically, it's some old versions that remain cached there:
```
halfak@deployment-sca03:/srv/deployment/ores/deploy-cache/revs$ du -hs *
1.9G 228b9b4ff925851bcb36bbeafe54359433ea1e92
2.4G 691b3409f0c1ca605bc47dc40692551a1e9b79af
1.5G 7c80636313b088928c8eba5d5bdf0b62b8db7f76
3.1G 9fd75a1109495cbe479893df0cb7ab56846548d9
1.7G c61b9c11a2cad56ecaad10f92e21126b36e673f4
```
I need a way to clean up some of these old versions, but I don't have the rights to delete them.
---------
On T157199#3000944 @hashar wrote:
The root cause is that the scap cache clone submodules from the deployment server instead of using the local cache via hardlinks.
Relatively to deployment-sca03 directory `/srv/deployment/ores`...
The repository is cloned from the deployment-servers under `<repo basename>-cache/cache` as a non-bare repo. Stat shows one of the pack files has 6 links to it:
```
$ stat deploy-cache/cache/.git/objects/pack/pack-75d8fab69e5b2b012f62050dd5d301fe8a87ba17.pack
File: ‘deploy-cache/cache/.git/objects/pack/pack-75d8fab69e5b2b012f62050dd5d301fe8a87ba17.pack’
Size: 91256382 Blocks: 178248 IO Block: 4096 regular file
Device: fe03h/65027d Inode: 792651 Links: 6
^^^
```
So later when scap deploys it just does a local clone to eg:
deploy -> deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76
That one uses hardlink for the main repository. The submodules however are cloned from the deployment server and apparently do not use `--references`:
```
$ git -C deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/wheels/ remote -v
origin http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/wheels (fetch)
origin http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/wheels (push)
```
Thus each revision in the cache ends up with each submodules fully cloned. Taking a pack file from the 'wheels' sub repo. Looking for `pack-779867fb188e708fc37de9957c907c9495ee4e4a.pack` and dumping the inode/filepath:
```
$ find . -name pack-779867fb188e708fc37de9957c907c9495ee4e4a.pack -printf '%i %h\n';
797113 ./deploy-cache/revs/691b3409f0c1ca605bc47dc40692551a1e9b79af/.git/modules/submodules/wheels/objects/pack
918553 ./deploy-cache/revs/c61b9c11a2cad56ecaad10f92e21126b36e673f4/.git/modules/submodules/wheels/objects/pack
795634 ./deploy-cache/revs/9fd75a1109495cbe479893df0cb7ab56846548d9/.git/modules/submodules/wheels/objects/pack
794849 ./deploy-cache/revs/228b9b4ff925851bcb36bbeafe54359433ea1e92/.git/modules/submodules/wheels/objects/pack
796556 ./deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/wheels/objects/pack
```
Imho we need to:
* make the cache directories to be bare repositories to save out the space of a workspace checkout
* pass to `git submodule update --init` the `--references` parameter so submodules benefit from the local cache and get hard linked.
All that should probably be made a sub task for scap.
---------