Page MenuHomePhabricator

Zuul repositories have too many refs causing slow updates
Closed, ResolvedPublic

Description

Zuul merge operations are quite slow. The reason is that fetches from Gerrit are painfully slow for some repositories:

Under zuul@gallium:/srv/ssd/zuul/git/ :

mediawiki/core$ time git fetch --dry-run

real	0m18.353s
user	0m17.781s
sys	0m0.236s

The operation is quite long because git send all references to the remote. And:

$ git show-ref|fgrep -c refs/zuul
51185
$

We need a script that list all references matching refs/zuul/* , inspect the commit date and delete the reference it is older than X days (for example 30 days). That will help git fetch operation and thus speed up Zuul merge operations.


to run the job until it is puppetized/packaged

find /srv/ssd/zuul/git/ -name .git -type d -print -exec /home/hashar/zuul-clear-refs.py --until 30 {} \;

Details

Reference
bz68481

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:28 AM
bzimport set Reference to bz68481.

Example of an operation that took 1m20s:

2014-07-23 21:59:22,755 DEBUG zuul.Repo: Resetting repository /srv/ssd/zuul/git/mediawiki/core
2014-07-23 21:59:22,755 DEBUG zuul.Repo: Updating repository /srv/ssd/zuul/git/mediawiki/core
2014-07-23 22:00:04,755 DEBUG zuul.Repo: Checking out 6466a598a9579db0789055b73001e39a6d7840a5
2014-07-23 22:00:45,412 DEBUG zuul.Repo: Merging refs/changes/46/148846/2 with args ['-s', 'resolve', 'FETCH_HEAD']

(

That cause a bunch of issues. Will get a script to clean up obsolete references.

I wrote a quick script which inspect the commit pointed by the Zuul reference and delete the reference whenever it is older than a given number of days (default 360).

Proposed upstream as https://review.openstack.org/109276

Will run it on gallium.

zuul@gallium:/srv/ssd/zuul/git/mediawiki/core$ git show-ref|fgrep -c refs/zuul/
51287

Then ran /home/hashar/zuul_clear_refs.py --until 360 .

And that dropped roughly 21k references:

$ git show-ref |fgrep -c refs/zuul
29639
$

Will process operations/puppet as well.

I have cleaned up a few more repositories

For reference, one can find the top 10 offenders by running:

cd /srv/ssd/zuul/git
find . -type d -name .git -exec bash -c 'echo -n "{}:"; git --git-dir {} show-ref|fgrep -c refs/zuul' \; | sort -nr -k2 -t: | head -n10

Lowering priority since the ref have been dealt with. Have to get Zuul fixed to garbage collect old references automatically.

demon removed a subscriber: demon.Dec 16 2014, 6:05 PM
hashar updated the task description. (Show Details)Mar 5 2015, 9:29 AM
hashar set Security to None.

Someone has bring the topic on the openstack-infra mailling list. So I followed up on the reviews that were pending on https://review.openstack.org/#/c/109276/ and wrote some basic documentation. That would help get it merged in I guess :-]

The task should be kept open until zuul-merger learns to garbage collect old references automatically.

zuul-clear-refs.py --verbose --dry-run --until 90 /srv/zuul/git/project

hashar updated the task description. (Show Details)Jun 23 2015, 2:34 PM
hashar removed hashar as the assignee of this task.Jun 25 2015, 11:09 AM

I am not actively working on this. See list of blockers to make the clean up automatic.

This task detail has the long command to run on gallium as zuul user:

sudo -u zuul find /srv/ssd/zuul/git/ -name .git -type d -print -exec /home/hashar/zuul-clear-refs.py --until 30 {} \;

The patch I have proposed upstream has been approved :-} https://review.openstack.org/#/c/109276/

We would want to include the utility in the Zuul Debian package then add some puppet cruft to have it run in a cron on a weekly(?) basis.

Our .deb package is up-to-date and include the zuul-clear-refs.py utility. It has a race condition though which I have detailed in T103528

Krinkle removed a subscriber: Krinkle.Sep 15 2016, 7:46 PM
hashar closed this task as Resolved.Apr 10 2019, 1:16 PM
hashar claimed this task.

Something somehow got enhanced and it is way faster nowadays. Either due to gallium disk that was slow, network, better Gerrit, optimizations of git or whatever.

$ git ls-remote .|grep -c refs/zuul
95826

$ git remote -v
origin	ssh://jenkins-bot@gerrit.wikimedia.org:29418/mediawiki/core (fetch)
origin	ssh://jenkins-bot@gerrit.wikimedia.org:29418/mediawiki/core (push)

$ time git fetch --dry-run

real	0m2.808s
user	0m2.352s
sys	0m0.376s