Page MenuHomePhabricator

Create mirror of Gerrit repositories for consumption by various tools
Open, NormalPublic

Description

When looking at Gerrit issues during spring 2019 (T221026) we noticed a lot of git operations emanating from various tools on WMCS.

They are mostly reads, and there is no firm indication that the batch requests are overloading Gerrit. But given the load they impose, it might be wise to shift the load to a mirror. They notably do not need to be 100% up to date with the master and could suffer the slight delay incurred by replication.

Relevant extracts from T221026:

@mmodell wrote:

From looking at http requests per minute in javamelody, over 1 year, I see that traffic has increased a lot:

https://gerrit.wikimedia.org/r/monitoring?part=graph&graph=httpHitsRate (http hits per minutes):

@thcipriani pointed out the mean stays identical, but the max has grown in March 2019 from roughly 4k/minutes to 6k/minutes.

@hashar proposed: Would it make sense to set a readonly replica such as git.wikimedia.org to offload Gerrit? The bots/scripts running on WMCS could be easily made to point to that mirror. And listed:

Out of 623k https requests in April 17th access logs:

RequestsIPDNS PTR
84110172.16.1.221codesearch4.codesearch.eqiad.wmflabs.
699212620:0:861:102:10:64:16:8phab1001.eqiad.wmnet.
51736172.16.1.85extdist-02.extdist.eqiad.wmflabs.
51736172.16.1.84extdist-01.extdist.eqiad.wmflabs.
51676172.16.1.86extdist-03.extdist.eqiad.wmflabs.
16465172.16.5.187integration-slave-docker-1051
16116172.16.5.162integration-slave-docker-1048
14709172.16.5.181integration-slave-docker-1050
13660172.16.1.36integration-slave-docker-1041
13579172.16.0.26integration-slave-docker-1054
12990172.16.6.184integration-slave-docker-1043
12909172.16.3.86integration-slave-docker-1040
11672172.16.7.168integration-slave-docker-1034
10847172.16.5.190integration-slave-docker-1052
9705xxxxxsome public internet IP
8786172.16.3.87integration-slave-docker-1037

Probably codesearch ( https://codesearch.wmflabs.org/ ), Phabricator and extdist ( https://www.mediawiki.org/wiki/Extension:ExtensionDistributor ) could be moved to a use a mirror.

The CI slaves do hammer Gerrit :-/

Note that its for any HTTP request, not just git-upload-pack. But the result is similar when filtering for upload-pack.

Not taken in account, git fetch from the zuul-mergers which are done over ssh with the jenkins-bot user.

Event Timeline

hashar created this task.Jun 21 2019, 9:18 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 21 2019, 9:18 AM

Candidates to be switched to a new Git mirror would be:

  • Phabricator. Or maybe we can have Gerrit to replicate to the Phabricator host and let Phabricator consume from the local replica.
  • CI, which is hammering Gerrit on purpose but might potentially be made to use a replica (unsure)

The git repositories on /srv/gerrit takes 25GBytes. Some of them should NOT be replicated though such as at least All-User.git but probably some private repositories should not as well (we have a few). But I guess the Gerrit replication mecanism is already smart enough to not replicate private repositories.

I would guess we can use some Ganeti VM then pick up a DNS entry (twist: git.wikimedia.org is already used as a redirect to Diffusion). But maybe we can do some evil path routing in Varnish to redirect git requests to the instance hosting the mirrors.

The CI slaves do hammer Gerrit :-/

Couldn’t they use a mirror as well? If they fetch a specific ref (refs/changes/ef/abcdef/xy), and we don’t expect the target of that ref to ever change (the next patch set instead increments the xy counter), then they can try to fetch from a mirror and sleep-and-retry until the ref is found, no? (I’m assuming that the delay would usually not be more than a few seconds – if it’s more like a few minutes, then this would probably delay CI too much.)

(edit: I hadn’t seen @hashar’s comment above yet when I wrote this)

The delay would be just a few seconds for sure, the devil is that the CI jobs fetch branches (refs/heads/*) which thus change constantly. At the time the change get merged, there is a race condition between the change being replicated and some CI jobs running assuming the head got updated already.

@hashar: we already have a read-only replica of most repositories on phabricator. Is that not satisfactory? It's kept up to date, with a lag of just a few minutes in the worst case.

hashar triaged this task as Normal priority.Jun 24 2019, 1:46 PM

@hashar: we already have a read-only replica of most repositories on phabricator. Is that not satisfactory? It's kept up to date, with a lag of just a few minutes in the worst case.

My previous experience was that Phabricator replicas were extremely slow to update (on the scale of hours) for less frequently updated repositories because it was polling. Is that no longer the case?

greg added a subscriber: greg.Jun 24 2019, 7:05 PM

(gah, sorry, fighting my own herald rule)

We can use gerrit slave feature for this (e.g a readonly). (gerrit2001 being the slave)

See https://gerrit.googlesource.com/homepage/+/md-pages/docs/Scaling.md

@hashar: we already have a read-only replica of most repositories on phabricator. Is that not satisfactory? It's kept up to date, with a lag of just a few minutes in the worst case.

My previous experience was that Phabricator replicas were extremely slow to update (on the scale of hours) for less frequently updated repositories because it was polling. Is that no longer the case?

From the table above, phab1001.eqiad.wmnet is showing as one of the top users. So I guess that is the polling of all repositories to update Diffusion / the readonly mirror. Potentially we could have Phabricator to use a git mirror instead of hitting the master Gerrit.


Of course, I do not have much metrics/info to explain the raise in https requests per minutes nor whether it is actually a problem (but it might).

hashar merged a task: Restricted Task.Jul 16 2019, 9:08 AM
hashar added a subscriber: MoritzMuehlenhoff.

OK, I set up a Gerrit mirror today. Clone URLs are https://ggmirror.wmflabs.org/git/<gerrit name>.git. And https://ggmirror.wmflabs.org/cgit/ as a web view/debugger. It will git fetch every 60 minutes.

codesearch is now pointing at the mirror, so all of that traffic should disappear (and the mirror should use less traffic. I hope). If the mirror can handle that amount of traffic well, I'll start switching over some more services over to it (extdist, libup).

greg added a comment.Jul 17 2019, 3:56 PM

Thanks for setting that up for your tools, @Legoktm !

I think we probably still want a mirror in production that can be used for other (production) things eg: Phabricator. And also to have a production grade host for this so it's not impacted by any unforeseen stability issues in WMCS.

I think we probably still want a mirror in production that can be used for other (production) things eg: Phabricator. And also to have a production grade host for this so it's not impacted by any unforeseen stability issues in WMCS.

Agreed 100%.

OK, I set up a Gerrit mirror today. Clone URLs are https://ggmirror.wmflabs.org/git/<gerrit name>.git. And https://ggmirror.wmflabs.org/cgit/ as a web view/debugger. It will git fetch every 60 minutes.
codesearch is now pointing at the mirror, so all of that traffic should disappear (and the mirror should use less traffic. I hope). If the mirror can handle that amount of traffic well, I'll start switching over some more services over to it (extdist, libup).

This made a huge difference in traffic volume! Thank for doing this @Legoktm :)

This might have broken Diffusion mirrors: T229756.

Dzahn added a subscriber: Dzahn.Aug 5 2019, 7:34 PM

There is now a replica of gerrit in codfw that can be used to clone from:

example:

git clone https://gerrit-replica.wikimedia.org/r/operations/puppet

Dzahn assigned this task to hashar.Aug 5 2019, 7:49 PM

@hashar Does this resolve the ticket or is it part of it as well to switch a lot of tools to using it?

How updated is gerrit-replica? Is it immediatelly updated after gerrit (master)? Thanks.

I think possibly a few mins delay. (It runs on one thread).

Dzahn added a comment.Aug 7 2019, 10:08 PM

replication should work again now. there was a syntax issue in the config that has been fixed.

@hashar Does this resolve the ticket or is it part of it as well to switch a lot of tools to using it?

This task asked to create a read-only mirror of Gerrit repositories and that part is indeed fulfilled now (via https://gerrit-replica.wikimedia.org/r/ ).

Left todo:

  • migrate the few tools that have been identified in this task: codesearch, extdist, wikifarm.pluggableauth.eqiad.wmflabs, phabricator etc.
  • Phase out the transient https://ggmirror.wmflabs.org/git/ which was/is pulling every repo every hour
  • Update doc, or at least advertise the new replica

And I guess we are set :]

Paladox added a comment.EditedAug 19 2019, 2:18 PM

@hashar phabricator has been migrated to use the replica :)

Codesearch and extdist also uses the replica (done by @Legoktm)

Apparently extdist is still reaching out to gerrit.wikimedia.org over https from at least:

  • extdist-04.extdist.eqiad.wmflabs.
  • extdist-05.extdist.eqiad.wmflabs.
  • extdist-01.extdist.eqiad.wmflabs.

@Legoktm can you revisit extdist configuration on labs and have it point to gerrit-replica.wikimedia.org instead?

And there is still wikifarm.pluggableauth.eqiad.wmflabs. but I have no idea what that one is :-\

(note to self: gotta verify whether those https hits are actually git requests, they might be regular API traffic)

@hashar phabricator has been migrated to use the replica :)

For projects on phabricator that are "observing" gerrit, I moved all of them. Some of them do mirroring, I did not update those. For posterity sake, the script I used to update phab URIs is P8857