Page MenuHomePhabricator

Scap3 submodule space issues
Closed, ResolvedPublic

Description

Scap3 currently makes a clone from the local git cache for each revision that is deployed to a target node. Git is fairly smart about using hardlinks for local clones so they only take space for the checkout and not multiple object copies; however, the way we are cloning submodules to targets is currently making copies of git objects, using an excess amount of space.

We should figure out a smart way to handle submodules so that targets don't keep an excess amount of git objects.


@Halfak wrote:

I've run into a weird problem. I'm trying to deploy ORES to sca03 and running into a disk space issue.

Here's the error I get in scap: P4895 "fatal: No space left on device"

deployment-tin (ores is 2.5GB):

halfak@deployment-tin:/srv/deployment$ du -hs ores
2.5G	ores

sca03 (ores is 11GB):

halfak@deployment-sca03:/srv/deployment$ du -hs * 
580M	eventstreams
11G	ores

It looks like all of it is deploy-cache:

halfak@deployment-sca03:/srv/deployment/ores$ du -hs *
0	deploy
11G	deploy-cache
8.5M	venv

Specifically, it's some old versions that remain cached there:

halfak@deployment-sca03:/srv/deployment/ores/deploy-cache/revs$ du -hs *
1.9G	228b9b4ff925851bcb36bbeafe54359433ea1e92
2.4G	691b3409f0c1ca605bc47dc40692551a1e9b79af
1.5G	7c80636313b088928c8eba5d5bdf0b62b8db7f76
3.1G	9fd75a1109495cbe479893df0cb7ab56846548d9
1.7G	c61b9c11a2cad56ecaad10f92e21126b36e673f4

I need a way to clean up some of these old versions, but I don't have the rights to delete them.


On T157199#3000944 @hashar wrote:

The root cause is that the scap cache clone submodules from the deployment server instead of using the local cache via hardlinks.

Relatively to deployment-sca03 directory /srv/deployment/ores...

The repository is cloned from the deployment-servers under <repo basename>-cache/cache as a non-bare repo. Stat shows one of the pack files has 6 links to it:

$ stat deploy-cache/cache/.git/objects/pack/pack-75d8fab69e5b2b012f62050dd5d301fe8a87ba17.pack
  File: ‘deploy-cache/cache/.git/objects/pack/pack-75d8fab69e5b2b012f62050dd5d301fe8a87ba17.pack’
  Size: 91256382  	Blocks: 178248     IO Block: 4096   regular file
Device: fe03h/65027d	Inode: 792651      Links: 6
                                                 ^^^

So later when scap deploys it just does a local clone to eg:

deploy -> deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76

That one uses hardlink for the main repository. The submodules however are cloned from the deployment server and apparently do not use --references:

$ git -C deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/wheels/ remote -v
origin	http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/wheels (fetch)
origin	http://deployment-tin.deployment-prep.eqiad.wmflabs/ores/deploy/.git/modules/submodules/wheels (push)

Thus each revision in the cache ends up with each submodules fully cloned. Taking a pack file from the 'wheels' sub repo. Looking for pack-779867fb188e708fc37de9957c907c9495ee4e4a.pack and dumping the inode/filepath:

$ find . -name pack-779867fb188e708fc37de9957c907c9495ee4e4a.pack -printf '%i %h\n';
797113 ./deploy-cache/revs/691b3409f0c1ca605bc47dc40692551a1e9b79af/.git/modules/submodules/wheels/objects/pack
918553 ./deploy-cache/revs/c61b9c11a2cad56ecaad10f92e21126b36e673f4/.git/modules/submodules/wheels/objects/pack
795634 ./deploy-cache/revs/9fd75a1109495cbe479893df0cb7ab56846548d9/.git/modules/submodules/wheels/objects/pack
794849 ./deploy-cache/revs/228b9b4ff925851bcb36bbeafe54359433ea1e92/.git/modules/submodules/wheels/objects/pack
796556 ./deploy-cache/revs/7c80636313b088928c8eba5d5bdf0b62b8db7f76/.git/modules/submodules/wheels/objects/pack

Imho we need to:

  • make the cache directories to be bare repositories to save out the space of a workspace checkout
  • pass to git submodule update --init the --references parameter so submodules benefit from the local cache and get hard linked.

All that should probably be made a sub task for scap.


Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJun 6 2016, 5:42 PM

Adding @akosiaris who had the same concerns.

thcipriani triaged this task as Medium priority.Sep 16 2016, 5:48 PM
thcipriani moved this task from Needs triage to Debt on the Scap board.

I have the same concern for git-fat repos as well.

thcipriani assigned this task to dduvall.Oct 2 2016, 5:23 PM
thcipriani removed dduvall as the assignee of this task.Jun 12 2017, 4:43 PM
thcipriani added a subscriber: dduvall.
mmodell claimed this task.Sep 6 2017, 4:55 PM
mmodell moved this task from To Triage to Next: Feature on the Deployments board.Sep 6 2017, 4:56 PM
mmodell edited projects, added Scap (Tech Debt Sprint FY201718-Q2); removed Scap.

Planning to resolve this during the upcoming quarter.

greg added a subscriber: greg.Sep 29 2017, 10:46 PM

Adding our Release-Engineering-Team (Kanban) project as we would like to work on this in the coming quarter or two (no promises though, this is not a "goal" only "other hoped for work").

So I've done some experimenting with various ways to handle submodules now that we have a relatively modern version of git in production.

  • I had thought, initially, that git workdir would be the solution, however:
    • It turns out that git workdir does not have good support for submodules.
    • This is being worked on in git but it's not ready for prime time yet
  • The straightforward solution would be to simply nuke the .git metadata for checked out submodule revisions in /srv/deployment
    • This only fixes the wasted disk space and does not address the issue of caching the submodules' git objects
  • The deployment-cache/cache repo should really be a bare .git repo but it currently is not
  • The deployment-cache/cache repo needs to contain submodules. Currently only the promoted revisions actually contain submodules.

So we need to move the submodules into the deployment cache repo and then nuke them from the checked out revisions....

Awesome to see some progress. I'd like to note T171758: Support git-lfs files in gerrit since it's related and would easily address this issue and many others for us. Still seeing a solution to this task would have great impact.

@Halfak: That one is also on my radar and it's related to this work.

Ok this should be live on deployment-tin

Assuming that T179013: Scap failing to rewrite submodule urls in beta is resolved then we should be able to bring D826: Cache submodules and use --reference to save space and friends to production which will resolve this task at last.

No update, this should go out to production with the next scap build.

mmodell closed this task as Resolved.Dec 21 2017, 5:58 PM