Page MenuHomePhabricator

Gerrit replication after a restart takes roughly 5 hours
Closed, ResolvedPublic

Description

After restarting Gerrit all repositories are scheduled for replication. It apparently takes five hours to go through all of them. We should speed it up, maybe by adding more worked threads to each remotes?

Replication plugin documentation: https://gerrit.wikimedia.org/r/plugins/replication/Documentation/config.md

View after a restart at 8:15 today from https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream

gerrit_replication_after_restart.png (855×917 px, 97 KB)

Event Timeline

Change 789810 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: replicate to codfw with 4 threads

https://gerrit.wikimedia.org/r/789810

Change 789810 merged by Dzahn:

[operations/puppet@production] gerrit: replicate to codfw with 4 threads

https://gerrit.wikimedia.org/r/789810

Mentioned in SAL (#wikimedia-releng) [2022-05-12T19:53:09Z] <hashar> gerrit: triggering full replication to gerrit2001 to test T307137

19:57:12 <hashar> !log Restarting Gerrit

I first try a manual replication to gerrit2001 but only one thread was processing. I guess the plugin had to be reloaded somehow I have choose to restart Gerrit instead. And:

$ gerrit show-queue -w|grep -v waiting
+ ssh -p 29418 hashar@gerrit.wikimedia.org gerrit show-queue -w
Task     State        StartTime         Command
------------------------------------------------------------------------------
e8eb46b9              19:58:37.654      [a80eae9a] push gerrit2@gerrit2001.wikimedia.org:/srv/gerrit/git/pywikibot/pycolorname.git [..all..]
481132bd              19:58:37.653      [08001a65] push gerrit2@gerrit2001.wikimedia.org:/srv/gerrit/git/operations/software/bernard.git [..all..]
28fb9e88              19:58:37.652      [e8002665] push gerrit2@gerrit2001.wikimedia.org:/srv/gerrit/git/blubber-doc/example/calculator-service.git [..all..]
88340a4d              19:58:37.652      [482a92e5] push gerrit2@gerrit2001.wikimedia.org:/srv/gerrit/git/operations/debs/wikimedia-search-qa.git [..all..]
3dd0eb0f              19:59:07.419      [1dd3a71b] push git@github.com:wikimedia/analytics-log2udp2 [..all..]
...

So we now replicate over 4 threads :]

After the deployment of 4 threads for replication to codfw, we can see it is faster:

gerrit_replica_1_vs_4_threads.png (829×915 px, 104 KB)

The latency is at ~ 100 ms which is good. I don't think there is any specific improvement to be made.