Page MenuHomePhabricator

Running out of space when deploying on sca03 (deploy-cache)
Closed, DuplicatePublic

Description

I've run into a weird problem. I'm trying to deploy ORES to sca03 and running into a disk space issue.

Here's the error I get in scap: P4895 "fatal: No space left on device"

deployment-tin (ores is 2.5GB):

halfak@deployment-tin:/srv/deployment$ du -hs ores
2.5G	ores

sca03 (ores is 11GB):

halfak@deployment-sca03:/srv/deployment$ du -hs * 
580M	eventstreams
11G	ores

It looks like all of it is deploy-cache:

halfak@deployment-sca03:/srv/deployment/ores$ du -hs *
0	deploy
11G	deploy-cache
8.5M	venv

Specifically, it's some old versions that remain cached there:

halfak@deployment-sca03:/srv/deployment/ores/deploy-cache/revs$ du -hs *
1.9G	228b9b4ff925851bcb36bbeafe54359433ea1e92
2.4G	691b3409f0c1ca605bc47dc40692551a1e9b79af
1.5G	7c80636313b088928c8eba5d5bdf0b62b8db7f76
3.1G	9fd75a1109495cbe479893df0cb7ab56846548d9
1.7G	c61b9c11a2cad56ecaad10f92e21126b36e673f4

I need a way to clean up some of these old versions, but I don't have the rights to delete them.

Event Timeline

Halfak updated the task description. (Show Details)
Halfak updated the task description. (Show Details)

The root cause is that the scap cache clone submodules from the deployment server instead of using the local cache via hardlinks.

Relatively to deployment-sca03 directory /srv/deployment/ores...

The repository is cloned from the deployment-servers under <repo basename>-cache/cache as a non-bare repo. Stat shows one of the pack files has 6 links to it:

$ stat deploy-cache/cache/.git/objects/pack/pack-75d8fab69e5b2b012f62050dd5d301fe8a87ba17.pack
  File: ‘deploy-cache/cache/.git/objects/pack/pack-75d8fab69e5b2b012f62050dd5d301fe8a87ba17.pack’
  Size: 91256382  	Blocks: 178248     IO Block: 4096   regular file
Device: fe03h/65027d	Inode: 792651      Links: 6
                                                 ^^^

So later when scap deploys it just does a local clone to eg:

deploy -> deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76

That one uses hardlink for the main repository. The submodules however are cloned from the deployment server and apparently do not use --references:

$ git -C deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/wheels/ remote -v
origin	http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/wheels (fetch)
origin	http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/wheels (push)

Thus each revision in the cache ends up with each submodules fully cloned. Taking a pack file from the 'wheels' sub repo. Looking for pack-779867fb188e708fc37de9957c907c9495ee4e4a.pack and dumping the inode/filepath:

$ find . -name pack-779867fb188e708fc37de9957c907c9495ee4e4a.pack -printf '%i %h\n';
797113 ./deploy-cache/revs/691b3409f0c1ca605bc47dc40692551a1e9b79af/.git/modules/submodules/wheels/objects/pack
918553 ./deploy-cache/revs/c61b9c11a2cad56ecaad10f92e21126b36e673f4/.git/modules/submodules/wheels/objects/pack
795634 ./deploy-cache/revs/9fd75a1109495cbe479893df0cb7ab56846548d9/.git/modules/submodules/wheels/objects/pack
794849 ./deploy-cache/revs/228b9b4ff925851bcb36bbeafe54359433ea1e92/.git/modules/submodules/wheels/objects/pack
796556 ./deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/wheels/objects/pack

Imho we need to:

  • make the cache directories to be bare repositories to save out the space of a workspace checkout
  • pass to git submodule update --init the --references parameter so submodules benefit from the local cache and get hard linked.

All that should probably be made a sub task for scap.

hashar closed this task as a duplicate of T137124: Scap3 submodule space issues.

Made this bug a dupe of the older task T137124 and I have copy pasted the task detail + my comment on that task.