Page MenuHomePhabricator

Nodepool images need Gerrit mirror for git-clone performance
Closed, ResolvedPublic

Description

In order to reset the workspace properly (T76304) we need to be able to preserve, or clone in a fast way, the repositories from out of the workspace.

This becomes even more important after we start using disposable VMs (T47499) as there wouldn't be any workspace to keep the old clones around.

Related Objects

Event Timeline

Krinkle raised the priority of this task from to High.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: Krinkle, hashar.
Krinkle renamed this task from Jenkins slaves in labs need an equivalent of Gerrit replication to speed up git-clone operations to Jenkins labs slaves (and later nodepool images) need Gerrit replication for git-clone performance.Apr 3 2015, 12:14 AM
Krinkle set Security to None.

Relevant script used by OpenStack for their nodepool instance to set up local git cache:

https://github.com/openstack-infra/project-config/blob/59e04c19850a067e57b6304ee398b20438b295a1/nodepool/scripts/cache_git_repos.py

Krinkle renamed this task from Jenkins labs slaves (and later nodepool images) need Gerrit replication for git-clone performance to Nodepool images need Gerrit replication for git-clone performance.Apr 3 2015, 12:16 AM

I created a diskimage-builder element wikimedia-puppet which clones operations/puppet.git on the building host as a bare repository. It is then copied into the image under /srv/git so we can use it as a mirror in instances.

Patch is https://gerrit.wikimedia.org/r/#/c/234975/

Relevant code that keep a cache on building host and inject it in the image filesystem: https://gerrit.wikimedia.org/r/#/c/234975/10/dib/elements/wikimedia-puppet/root.d/01-populate-puppet-repo,unified or in pseudo code:

git clone --bare "https://gerrit.wikimedia.org/r/p/operations/puppet.git" /srv/dib/cache/operations/puppet
mkdir -p $IMAGE_ROOT_FS/srv/git/operations/puppet.git
cp -a /srv/dib/cache/operations/puppet $IMAGE_ROOT_FS/srv/git/operations/puppet.git

Example usage at top of https://gerrit.wikimedia.org/r/#/c/234975/10/dib/elements/wikimedia-puppet/install.d/05-puppet,unified :

git clone /srv/git/operations/puppet.git /puppet/

We can make the code more generic to inject more repositories.

Potentially we could use diskimage-builder element source-repositories upstream doc. I haven't looked at the code though and not sure it can be abused to match our use case. This way we would just have to provide a yaml file listing what we want copy.

Upstream code: https://github.com/openstack/diskimage-builder/tree/master/elements/source-repositories or:

git clone https://github.com/openstack/diskimage-builder
cd diskimage-builder/elements/source-repositories

I could not use source-repositories properly and it does a non-bare clone.

There is some code in https://gerrit.wikimedia.org/r/#/c/239365/ . Given a list of repositories, that clones them as bare repo on the building host. Then the content is copied in the image filesystem under /srv/git/ . We should be able to hint zuul-cloner about it.

Still have to get the script to maintain the cached repos. Namely:

git pack-refs --all
git repack -A -d
git gc --prune=all
hashar lowered the priority of this task from High to Medium.Sep 19 2015, 10:32 AM
hashar moved this task from Backlog to In-progress on the Continuous-Integration-Scaling board.

This is more or less in progress. I think I have the base code to handle the mirroring, gotta refactor a bit the DIB manifest.

hashar renamed this task from Nodepool images need Gerrit replication for git-clone performance to Nodepool images need Gerrit mirror for git-clone performance.Sep 21 2015, 7:09 PM

Change 239909 had a related patch set uploaded (by Hashar):
nodepool: cleanup git mirror cache

https://gerrit.wikimedia.org/r/239909

Change 239909 merged by jenkins-bot:
nodepool: cleanup git mirror cache

https://gerrit.wikimedia.org/r/239909

The cache is generated when building the image: via dib/elements/wikimedia/root.d/01-mirror-gerrit-repos. For now, it only mirrors:

mediawiki/core.git
operations/puppet.git
integration/config.git

dib/elements/wikimedia/install.d/51-git-mirror-ownership makes them owned by jenkins (the username defined by the devuser element and used ssh into the instance).

The script that prepare snapshot images should probably refresh the mirrors. Should be done via nodepool/scripts/setup_node.sh. Currently when refreshing the snapshot we clone from the mirror then git pull from Gerrit.