Page MenuHomePhabricator

Speed up git clone operation in wmf-quibble job
Closed, ResolvedPublic

Description

Checking a build of wmf-quibble-core-vendor-mysql-php74, cloning and checking out the repos took 157 seconds or 2 minutes 37 seconds. That is a bit long.

I copied the Jenkins console output with the elapsed time and went to process them with some python:

#!/usr/bin/env python

import datetime
import re


def parse(logs):
    times = {}
    total = 0
    for line in logs.splitlines():
        match = re.match(
            '(?P<timing>.+) INFO:zuul.Cloner.(?P<repo>.+):(Creating|Prepared)',
            line
        )
        if match:
            (hours, minutes, seconds, milliseconds) = re.split(
                '[\\.:]', match['timing'])
            delta = datetime.timedelta(
                hours=int(hours),
                minutes=int(minutes),
                seconds=int(seconds),
                milliseconds=int(milliseconds),
            )

            if match['repo'] in times:
                duration = delta - times[match['repo']]
                times[match['repo']] = duration
            else:
                times[match['repo']] = delta

    for repo in reversed(sorted(times, key=times.get)):
        duration = times[repo].total_seconds()
        total = total + duration
        print("%s %s" % (repo, duration))
    print("TOTAL: %s" % total)

parse("""
<LOG HERE>
""")

Which gives:

mediawiki/extensions/GrowthExperiments 24.505
mediawiki/extensions/Translate 22.374
mediawiki/extensions/MobileFrontend 22.021
mediawiki/extensions/CirrusSearch 20.855
mediawiki/extensions/ContentTranslation 20.646
mediawiki/extensions/InputBox 20.413
mediawiki/extensions/GuidedTour 19.909
mediawiki/extensions/Interwiki 19.711
mediawiki/extensions/ConfirmEdit 19.399
mediawiki/extensions/ImageMap 19.205
mediawiki/extensions/ProofreadPage 19.131
mediawiki/extensions/Kartographer 19.046
mediawiki/extensions/GlobalPreferences 18.968
mediawiki/extensions/Babel 18.693
mediawiki/extensions/UniversalLanguageSelector 18.612
mediawiki/extensions/Graph 18.571
mediawiki/extensions/JsonConfig 18.464
mediawiki/extensions/MobileApp 18.39
mediawiki/extensions/Elastica 18.371
mediawiki/extensions/FileImporter 18.058
mediawiki/extensions/Poem 17.87
mediawiki/extensions/PdfHandler 17.818
mediawiki/extensions/EventBus 17.694
mediawiki/extensions/PageImages 17.465
mediawiki/extensions/SpamBlacklist 17.376
mediawiki/extensions/GeoData 17.373
mediawiki/extensions/SandboxLink 16.735
mediawiki/extensions/Disambiguator 16.731
mediawiki/extensions/TimedMediaHandler 16.286
mediawiki/extensions/EventStreamConfig 16.053
mediawiki/extensions/SiteMatrix 15.782
mediawiki/extensions/Thanks 15.562
mediawiki/extensions/WikimediaMessages 15.456
mediawiki/extensions/CiteThisPage 15.192
mediawiki/extensions/MediaModeration 14.591
mediawiki/extensions/WikiEditor 14.407
mediawiki/extensions/IPInfo 14.226
mediawiki/extensions/AntiSpoof 13.773
mediawiki/extensions/WikiLove 13.699
mediawiki/extensions/CheckUser 13.034
mediawiki/extensions/AbuseFilter 12.911
mediawiki/extensions/WikibaseMediaInfo 12.286
mediawiki/extensions/EventLogging 12.246
mediawiki/extensions/Math 12.085
mediawiki/extensions/PageTriage 11.916
mediawiki/extensions/Wikibase 11.675
mediawiki/extensions/PageViewInfo 11.412
mediawiki/extensions/WikibaseCirrusSearch 11.406
mediawiki/extensions/Scribunto 10.858
mediawiki/extensions/cldr 10.443
mediawiki/extensions/TemplateData 10.215
mediawiki/skins/MinervaNeue 9.873
mediawiki/extensions/Gadgets 9.757
mediawiki/extensions/ParserFunctions 9.713
mediawiki/extensions/Echo 9.664
mediawiki/extensions/GlobalCssJs 9.473
mediawiki/extensions/VisualEditor 9.34
mediawiki/extensions/CodeEditor 8.944
mediawiki/vendor 8.903
mediawiki/skins/Vector 8.818
mediawiki/extensions/BetaFeatures 8.6
mediawiki/extensions/NavigationTiming 7.646
mediawiki/extensions/Cite 5.806
mediawiki/extensions/CommunityConfiguration 3.48

The sum is around 15 minutes but we run up to 8 clones in parallel.

My guess is something is off in the infra. Maybe the mirrors are not working or the Zuul merger have too many refs. The data are from cloning from contint1002.wikimedia.org.

Details

Event Timeline

hashar changed the task status from Open to Stalled.Sep 13 2024, 2:02 PM

A single entry:

00:01:16.034 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:Creating repo mediawiki/extensions/GrowthExperiments from cache /srv/git/mediawiki/extensions/GrowthExperiments.git
00:01:18.997 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:Updating origin remote in repo mediawiki/extensions/GrowthExperiments to https://gerrit.wikimedia.org/r/mediawiki/extensions/GrowthExperiments
00:01:40.066 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:upstream repo has branch master
00:01:40.232 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:Falling back to branch master
00:01:40.539 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:Prepared mediawiki/extensions/GrowthExperiments repo with branch master at commit cf7267723acb91ca69910e79af251b21b4d55d0b

I am pretty sure its the old issue from the Zuul cloner code and the code processing every single branches and we have too many:

$ git -C ~/projects/mediawiki/extensions/GrowthExperiments branch -r|wc -l
256

Which boils down to having to solve T303828: Delete wmf branches from Gerrit repositories

That is essentially the same as T220606 since Quibble borrowed the cloning code from Zuul :)

A single entry:

00:01:16.034 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:Creating repo mediawiki/extensions/GrowthExperiments from cache /srv/git/mediawiki/extensions/GrowthExperiments.git
00:01:18.997 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:Updating origin remote in repo mediawiki/extensions/GrowthExperiments to https://gerrit.wikimedia.org/r/mediawiki/extensions/GrowthExperiments
00:01:40.066 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:upstream repo has branch master
00:01:40.232 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:Falling back to branch master
00:01:40.539 INFO:zuul.Cloner.mediawiki/extensions/GrowthExperiments:Prepared mediawiki/extensions/GrowthExperiments repo with branch master at commit cf7267723acb91ca69910e79af251b21b4d55d0b

This is puzzling to me. What could be taking 20 seconds?

  1. Creating repo: git clone /srv/git/mediawiki/extensions/GrowthExperiments.git ./extensions/GrowthExperiments
  2. Updating origin remote: git remote add origin https://gerrit.wikimedia.org/r/mediawiki/extensions/GrowthExperiments
  3. upstream repo has branch: Something here is taking 20 seconds

in (3) it:

  1. Runs git remote update
  2. Runs git remote prune
  3. GitPython walks .git/refs/remotes/origin and reads .git/packed-refs which should be fast.

I tried this on an integration agent:

thcipriani@integration-agent-docker-1052:/tmp$ time git clone /srv/git/mediawiki/extensions/GrowthExperiments.git ge
Cloning into 'ge'...
done.

real    0m0.629s
user    0m0.143s
sys     0m0.397s
thcipriani@integration-agent-docker-1052:/tmp/ge$ time git remote set-url origin https://gerrit.wikimedia.org/r/mediawiki/extensions/GrowthExperiments.git

real    0m0.007s
user    0m0.001s
sys     0m0.007s
thcipriani@integration-agent-docker-1052:/tmp/ge$ time git remote update -p 
...
real    0m1.748s
user    0m1.412s
sys     0m0.265s
thcipriani@integration-agent-docker-1052:/tmp/ge$ time find .git/refs/remotes/origin -type f
real    0m0.008s
user    0m0.002s
sys     0m0.007s
thcipriani@integration-agent-docker-1052:/tmp/ge$ time grep master .git/packed-refs
real    0m0.004s
user    0m0.000s
sys     0m0.004s

Less than 2.39 seconds real time. I played with GitPython (in a container since it's python2, too) and found roughly the same thing. There must be something else going on here making it slow. We should clean up branches. I'm unsure if it's causing 20 seconds of delay.

@ArthurTaylor and I had noticed this and were discussing. In addition to pruning branches, is it possible that using clone depth (to e.g. 50) would be reasonable? It doesn't seem like we should need the full git history of each project in order to run the CI builds.

This is puzzling to me. What could be taking 20 seconds?

Sorry I should have given a summary of T220606 which made me block this on T303828: Delete wmf branches from Gerrit repositories.

The issue is not related to Git and GitPython themselves, but in how Zuul reset a git repository on the Zuul mergers, you can follow the logic in T220606 task description, but in short Zuul iterates each of the remote branches, creates it locally. At the time I took the measure, there were 150 branches and that took roughly 90 seconds.

So essentially we gotta convert unused branches to tags (I would love old releases branches to be converted as well which will make it easier to find out which ones are still supported, but that is another topic).

The resetting slowness has been fixed in newer Zuul versions.

@ArthurTaylor and I had noticed this and were discussing. In addition to pruning branches, is it possible that using clone depth (to e.g. 50) would be reasonable? It doesn't seem like we should need the full git history of each project in order to run the CI builds.

A shallow clone is not going to solve it. From several years ago, the fastest way I found to set up the environment was to clone from a mirror then bring it up to date by fetching from Zuul repos/Gerrit.


Essentially there are two fixes:

  1. A short term one which is to get rid of old branches T303828 . Then when I gave it a try some years ago I have hit a wall with the branch script which is not a very fun code base to deal with.
  1. Eventually Quibble code in zuul/merger/merger.py should be updated to not iterate through each of the branches or to be made smarter when resetting the repository. There are some changes that were made by upstream, but most probably can't be ported as is:
  2. Clarify merger updates and resets (that is some refactoring?)
  3. Handle annotated and signed tags when packing refs (pretty sure that depends on other things)
  4. Speed up merger git resets

And bunch of other changes.

@ArthurTaylor and I had noticed this and were discussing. In addition to pruning branches, is it possible that using clone depth (to e.g. 50) would be reasonable? It doesn't seem like we should need the full git history of each project in order to run the CI builds.

A shallow clone is not going to solve it. From several years ago, the fastest way I found to set up the environment was to clone from a mirror then bring it up to date by fetching from Zuul repos/Gerrit.

I don't understand why not? With clone depth of 1, cloning MW core takes 14 seconds, with clone depth 50, cloning MW core takes 38 seconds, while with all history it's 2 minutes and 22 seconds. Wouldn't shallow cloning help?

I don't understand why not? With clone depth of 1, cloning MW core takes 14 seconds, with clone depth 50, cloning MW core takes 38 seconds, while with all history it's 2 minutes and 22 seconds. Wouldn't shallow cloning help?

We do a git clone from a local mirror of the repository, which is basically copying the .git directory from a partition to another one and is thus fast:

$ time git clone /srv/git/mediawiki/core.git
Cloning into 'core'...
done.
Updating files: 100% (10249/10249), done.

real	0m2.916s
user	0m1.094s
sys	0m1.718s

It is then updated from Gerrit/Zuul which is fast as well since most references are up to date or only missing a few objects:

$ time git fetch https://gerrit.wikimedia.org/r/mediawiki/core 
From https://gerrit.wikimedia.org/r/mediawiki/core
 * branch                    HEAD       -> FETCH_HEAD

real	0m2.655s
user	0m0.694s
sys	0m1.456s

Because most deltas and files are already present, the network/disk overhead is rather small.

In all fairness while timing the above commands, I found out the git mirrors on the instances were no more updating the branches since roughly January 30. The I did took 14 seconds (same as cloning in your example) I guess due to the network transfer and all the delta resolutions. That was a regression, I filed T376981 and got a fix for it. So that will at least speed up a bit the fetch once all mirrors have been updated.

...

$ git -C ~/projects/mediawiki/extensions/GrowthExperiments branch -r|wc -l
256

Which boils down to having to solve T303828: Delete wmf branches from Gerrit repositories

After I have removed the old wmf branches, we went from 256 to 17 branches:

git -C ~/projects/mediawiki/extensions/GrowthExperiments branch -r|wc -l
17

I parsed the last successful run of wmf-quibble-core-vendor-mysql-php74 for comparison:

mediawiki/extensions/WikimediaMessages 13.395
mediawiki/extensions/CirrusSearch 8.59
mediawiki/extensions/WikiEditor 8.203
mediawiki/extensions/ContentTranslation 5.676
mediawiki/extensions/CheckUser 5.413
mediawiki/extensions/Babel 4.783
mediawiki/extensions/TimedMediaHandler 4.129
mediawiki/vendor 3.796
mediawiki/extensions/ConfirmEdit 3.564
mediawiki/extensions/Math 3.548
mediawiki/extensions/Wikibase 3.54
mediawiki/extensions/PageTriage 3.416
mediawiki/extensions/CodeEditor 3.33
mediawiki/extensions/ProofreadPage 3.142
mediawiki/extensions/Gadgets 2.576
mediawiki/extensions/UniversalLanguageSelector 2.482
mediawiki/extensions/Translate 2.477
mediawiki/extensions/WikiLove 2.421
mediawiki/extensions/CiteThisPage 2.348
mediawiki/extensions/FileImporter 2.295
mediawiki/extensions/VisualEditor 2.242
mediawiki/extensions/MobileFrontend 2.122
mediawiki/extensions/GrowthExperiments 2.101
mediawiki/extensions/Kartographer 1.984
mediawiki/extensions/Interwiki 1.927
mediawiki/extensions/BetaFeatures 1.884
mediawiki/extensions/AntiSpoof 1.872
mediawiki/extensions/GlobalPreferences 1.806
mediawiki/extensions/AbuseFilter 1.725
mediawiki/extensions/Disambiguator 1.703
mediawiki/extensions/Cite 1.703
mediawiki/extensions/Echo 1.699
mediawiki/extensions/WikibaseMediaInfo 1.687
mediawiki/extensions/CommunityConfiguration 1.635
mediawiki/extensions/Graph 1.592
mediawiki/extensions/Thanks 1.536
mediawiki/extensions/GlobalCssJs 1.508
mediawiki/extensions/ParserFunctions 1.479
mediawiki/extensions/InputBox 1.437
mediawiki/extensions/JsonConfig 1.413
mediawiki/extensions/GuidedTour 1.392
mediawiki/extensions/ImageMap 1.384
mediawiki/extensions/SiteMatrix 1.372
mediawiki/extensions/SpamBlacklist 1.336
mediawiki/extensions/IPInfo 1.256
mediawiki/skins/Vector 1.248
mediawiki/skins/MinervaNeue 1.212
mediawiki/extensions/MobileApp 1.211
mediawiki/extensions/EventBus 1.188
mediawiki/extensions/WikibaseCirrusSearch 1.158
mediawiki/extensions/Elastica 1.152
mediawiki/extensions/GeoData 1.134
mediawiki/extensions/PdfHandler 1.131
mediawiki/extensions/Poem 1.099
mediawiki/extensions/PageImages 1.098
mediawiki/extensions/EventStreamConfig 1.05
mediawiki/extensions/PageViewInfo 1.018
mediawiki/extensions/cldr 1.011
mediawiki/extensions/SandboxLink 0.973
mediawiki/extensions/TemplateData 0.95
mediawiki/extensions/CommunityConfigurationExample 0.938
mediawiki/extensions/NavigationTiming 0.893
mediawiki/extensions/MediaModeration 0.883
mediawiki/extensions/EventLogging 0.869
mediawiki/extensions/Scribunto 0.841

It went from 157 seconds down to 37 seconds, saving 120 seconds or two minutes and the cumulated time went from ~ 15 minutes down to 152 seconds. Ie if we ran the clone serially it would be as fast/slow as it used to be in parallel.

Some comparisons:

RepoBeforeAfter
mediawiki/extensions/GrowthExperiments24.5052.101
mediawiki/extensions/Translate22.3742.477
mediawiki/extensions/MobileFrontend22.0212.122

Those are the repositories that are mirrored on the instances.

I will investigate the few slow ones:

mediawiki/extensions/WikimediaMessages 13.395
mediawiki/extensions/CirrusSearch 8.59
mediawiki/extensions/WikiEditor 8.203
mediawiki/extensions/ContentTranslation 5.676
mediawiki/extensions/CheckUser 5.413
mediawiki/extensions/Babel 4.783
mediawiki/extensions/TimedMediaHandler 4.129
hashar changed the task status from Stalled to Open.Nov 26 2024, 8:38 AM

For release branches, see T380841

For mediawiki/extensions/WikimediaMessages taking 13.395, that is when resolving the deltas. It is ~ 50MB, I will add it to the list of mirrored repositories.

Change #1099657 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] ci: add WikimediaMessages to git cache

https://gerrit.wikimedia.org/r/1099657

After having cherry picked the Puppet patch on the integration project, I gave it a try on integration-agent-docker-1040.integration.eqiad1.wikimedia.cloud:

$ sudo docker run -v /srv/git:/srv/git -v "$(pwd)"/src:/src \
    --rm -it docker-registry.wikimedia.org/releng/quibble-buster-php74:1.11.0 \
    -c /bin/true mediawiki/extensions/WikimediaMessages \
    --git-parallel=1 2>&1|ts -s '%H:%M:%.s'
..
00:00:10.559435 INFO:zuul.Cloner:Creating repo mediawiki/extensions/WikimediaMessages from cache /srv/git/mediawiki/extensions/WikimediaMessages.git
00:00:11.003473 INFO:zuul.Cloner:Updating origin remote in repo mediawiki/extensions/WikimediaMessages to https://gerrit.wikimedia.org/r/mediawiki/extensions/WikimediaMessages
00:00:11.785612 INFO:zuul.Cloner:Falling back to branch master
00:00:11.853725 INFO:zuul.Cloner:Prepared mediawiki/extensions/WikimediaMessages repo with branch master at commit 57a8067e12484e5ba12d109362f3c33a1e5ee0da

Cloning WikimediaMessages went from ~ 13 seconds to half a second :-]

After having cherry picked the Puppet patch on the integration project, I gave it a try on integration-agent-docker-1040.integration.eqiad1.wikimedia.cloud:

$ sudo docker run -v /srv/git:/srv/git -v "$(pwd)"/src:/src \
    --rm -it docker-registry.wikimedia.org/releng/quibble-buster-php74:1.11.0 \
    -c /bin/true mediawiki/extensions/WikimediaMessages \
    --git-parallel=1 2>&1|ts -s '%H:%M:%.s'
..
00:00:10.559435 INFO:zuul.Cloner:Creating repo mediawiki/extensions/WikimediaMessages from cache /srv/git/mediawiki/extensions/WikimediaMessages.git
00:00:11.003473 INFO:zuul.Cloner:Updating origin remote in repo mediawiki/extensions/WikimediaMessages to https://gerrit.wikimedia.org/r/mediawiki/extensions/WikimediaMessages
00:00:11.785612 INFO:zuul.Cloner:Falling back to branch master
00:00:11.853725 INFO:zuul.Cloner:Prepared mediawiki/extensions/WikimediaMessages repo with branch master at commit 57a8067e12484e5ba12d109362f3c33a1e5ee0da

Cloning WikimediaMessages went from ~ 13 seconds to half a second :-]

nice!

Change #1099657 merged by Dzahn:

[operations/puppet@production] ci: add WikimediaMessages to git cache

https://gerrit.wikimedia.org/r/1099657

A build of wmf-quibble-core-vendor-mysql-php74 from this morning show the zuul clone took 34.393 seconds

TOTAL: 142.69199999999995
mediawiki/extensions/WikiEditor 7.649
mediawiki/extensions/CirrusSearch 7.645
mediawiki/extensions/CheckUser 7.156
mediawiki/extensions/Translate 6.735
mediawiki/extensions/Babel 5.763
mediawiki/extensions/CodeEditor 5.474
mediawiki/extensions/ConfirmEdit 5.366
mediawiki/extensions/ContentTranslation 5.072
mediawiki/extensions/Disambiguator 4.565
mediawiki/extensions/Echo 4.258
mediawiki/vendor 3.683
mediawiki/extensions/Wikibase 3.384
mediawiki/extensions/TimedMediaHandler 3.353
mediawiki/extensions/Math 3.236
mediawiki/extensions/PageTriage 2.989
mediawiki/extensions/ProofreadPage 2.579
mediawiki/extensions/CiteThisPage 2.417
mediawiki/extensions/VisualEditor 2.322
mediawiki/extensions/UniversalLanguageSelector 2.238
mediawiki/extensions/WikiLove 2.087
mediawiki/extensions/GrowthExperiments 1.925
mediawiki/extensions/WikimediaMessages 1.807
mediawiki/extensions/Gadgets 1.713
mediawiki/extensions/Interwiki 1.699
mediawiki/extensions/BetaFeatures 1.646
mediawiki/extensions/WikibaseMediaInfo 1.644
mediawiki/extensions/Kartographer 1.633
mediawiki/extensions/AntiSpoof 1.633
mediawiki/skins/MinervaNeue 1.565
mediawiki/extensions/Elastica 1.55
mediawiki/extensions/AbuseFilter 1.52
mediawiki/extensions/FileImporter 1.507
mediawiki/extensions/Thanks 1.502
mediawiki/extensions/MobileFrontend 1.407
mediawiki/extensions/Cite 1.398
mediawiki/extensions/ParserFunctions 1.34
mediawiki/extensions/IPInfo 1.309
mediawiki/skins/Vector 1.275
mediawiki/extensions/Graph 1.223
mediawiki/extensions/InputBox 1.168
mediawiki/extensions/CommunityConfiguration 1.138
mediawiki/extensions/SpamBlacklist 1.127
mediawiki/extensions/JsonConfig 1.123
mediawiki/extensions/WikibaseCirrusSearch 1.116
mediawiki/extensions/GlobalPreferences 1.114
mediawiki/extensions/SiteMatrix 1.113
mediawiki/extensions/GuidedTour 1.106
mediawiki/extensions/ImageMap 1.093
mediawiki/extensions/MobileApp 1.049
mediawiki/extensions/EventBus 1.032
mediawiki/extensions/GeoData 1.009
mediawiki/extensions/EventStreamConfig 0.982
mediawiki/extensions/PdfHandler 0.969
mediawiki/extensions/cldr 0.952
mediawiki/extensions/PageViewInfo 0.944
mediawiki/extensions/MediaModeration 0.918
mediawiki/extensions/SandboxLink 0.911
mediawiki/extensions/GlobalCssJs 0.898
mediawiki/extensions/Scribunto 0.892
mediawiki/extensions/Poem 0.881
mediawiki/extensions/TemplateData 0.865
mediawiki/extensions/PageImages 0.861
mediawiki/extensions/NavigationTiming 0.797
mediawiki/extensions/EventLogging 0.695
mediawiki/extensions/CommunityConfigurationExample 0.672

Ideally we would have all gated extensions to be mirrored on the instances, but that above reports show it is probably good enough as is.

There is still the subtask T303828 that remains to be fully fixed, but at least the old wmf/release branches have been discard already. I am claiming victory after having reclaimed two minutes ? from the builds.

See also a lengthy analyzes I did previously on T351466#9353861