Page MenuHomePhabricator

Gerrit replication after a restart takes roughly 5 hours
Closed, ResolvedPublic


After restarting Gerrit all repositories are scheduled for replication. It apparently takes five hours to go through all of them. We should speed it up, maybe by adding more worked threads to each remotes?

Replication plugin documentation:

View after a restart at 8:15 today from

gerrit_replication_after_restart.png (855×917 px, 97 KB)

Event Timeline

Change 789810 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: replicate to codfw with 4 threads

Change 789810 merged by Dzahn:

[operations/puppet@production] gerrit: replicate to codfw with 4 threads

Mentioned in SAL (#wikimedia-releng) [2022-05-12T19:53:09Z] <hashar> gerrit: triggering full replication to gerrit2001 to test T307137

19:57:12 <hashar> !log Restarting Gerrit

I first try a manual replication to gerrit2001 but only one thread was processing. I guess the plugin had to be reloaded somehow I have choose to restart Gerrit instead. And:

$ gerrit show-queue -w|grep -v waiting
+ ssh -p 29418 gerrit show-queue -w
Task     State        StartTime         Command
e8eb46b9              19:58:37.654      [a80eae9a] push [..all..]
481132bd              19:58:37.653      [08001a65] push [..all..]
28fb9e88              19:58:37.652      [e8002665] push [..all..]
88340a4d              19:58:37.652      [482a92e5] push [..all..]
3dd0eb0f              19:59:07.419      [1dd3a71b] push [..all..]

So we now replicate over 4 threads :]

After the deployment of 4 threads for replication to codfw, we can see it is faster:

gerrit_replica_1_vs_4_threads.png (829×915 px, 104 KB)

The latency is at ~ 100 ms which is good. I don't think there is any specific improvement to be made.