Page MenuHomePhabricator

Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster)
Open, NormalPublic

Description

Gerrit seems to be approaching the limit of its current hardware.

The CPU usage seems to be OK -- rarely over 50% -- I think when that happens it's usually trying to garbage collect memory rapidly and handle queued requests.

The Disk size seems fine as well: plenty of room and I rarely see any IO wait.

Memory is where the problem is for that machine.

At some point, the size of our repos exceeded the size of the heap we're able to allocate. We're at roughly 32GB of git repos on disk currently and we have 32GB of ram in the machine (20GB heap).

I think ideally we'd be able to fit all of the caches + indexes + a good portion of our git repos into memory at the same time as all the other gerrit objects. Caches don't seem to consume a large amount of memory (<1 GB; 1.1GB persisted to disk); we have 4GB set aside for packfiles exclusively (would be nice to up this if we had space); indexes are 2GB.

We run at a 95th%ile of 18GB of ram in use. The G1GC we're using keeps 10% headspace before triggering garbage collection (which explains the 18GB). We do; however, still manage to hit 20GB of ram in use occasionally. This is in-spite of doing weekly git gc.

I think it would be good to double the amount of ram for Gerrit to 64GB.

Event Timeline

Krenair added a subscriber: Krenair.May 2 2019, 6:59 PM

Wanted to note that gerrit2001 has 64gb of ram, so this increase would match it so that we have the same ram specs in both data centres.

Dzahn added a comment.May 2 2019, 7:53 PM

So.. cobalt is already on a list of [[ T217764 | machines will be over 5 years old during FY19-20 ]] -> T217764#5005267 which was compiled to determine the number of needed (misc) replacement servers.

The procurement ticket was T120248.

Paladox is right that gerrit2001 has the 64GB and they should match. Also cobalt should either be reinstalled as gerrit1001 or replaced by gerrit1001 for consistent naming.

This should probably mean we replace the entire server and i don't think we do RAM upgrades. Adding dcops.

Dzahn claimed this task.May 3 2019, 8:26 PM
CDanis added a subscriber: mark.May 6 2019, 5:51 PM

cc @mark who I know is about to start looking at hardware requests for the coming FY

Dzahn added a comment.May 6 2019, 5:52 PM

I expect this to be a topic in our (DP - SRE) meeting this Wednesday.

Dzahn mentioned this in Unknown Object (Task).May 10 2019, 9:50 PM

created S4 procurement ticket for this at T222984

Dzahn changed the task status from Open to Stalled.May 22 2019, 8:38 PM
jijiki triaged this task as Low priority.Jun 21 2019, 8:34 AM
Dzahn changed the task status from Stalled to Open.Wed, Sep 11, 11:18 PM
Dzahn raised the priority of this task from Low to Normal.

Change 535962 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] acme_chief: add gerrit1001 as authorized host for gerrit certs

https://gerrit.wikimedia.org/r/535962

Change 535964 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: allow connections from gerrit1001

https://gerrit.wikimedia.org/r/535964

Change 535965 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] ci: allow ssh to new gerrit server gerrit1001

https://gerrit.wikimedia.org/r/535965

Change 535966 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb::ferm_misc: allow connections from gerrit1001

https://gerrit.wikimedia.org/r/535966

Change 535969 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] smokeping: replace cobalt with gerrit1001

https://gerrit.wikimedia.org/r/535969

Change 535971 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: add gerrit1001 to SSH known_hosts file

https://gerrit.wikimedia.org/r/535971

Change 535962 merged by Dzahn:
[operations/puppet@production] acme_chief: add gerrit1001 as authorized host for gerrit certs

https://gerrit.wikimedia.org/r/535962

Change 535964 merged by Dzahn:
[operations/puppet@production] gerrit: allow connections from gerrit1001

https://gerrit.wikimedia.org/r/535964

Change 535965 merged by Dzahn:
[operations/puppet@production] ci: allow ssh from new gerrit server gerrit1001 in ferm

https://gerrit.wikimedia.org/r/535965

Change 535971 merged by Dzahn:
[operations/puppet@production] gerrit: add gerrit1001 to SSH known_hosts file

https://gerrit.wikimedia.org/r/535971

Change 536357 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: add role on gerrit1001 and remove spare

https://gerrit.wikimedia.org/r/536357

Change 535969 merged by Dzahn:
[operations/puppet@production] smokeping: replace cobalt with gerrit1001

https://gerrit.wikimedia.org/r/535969

Dzahn renamed this task from Gerrit Hardware Upgrade to Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster).Fri, Sep 13, 3:05 AM

Change 536357 merged by Dzahn:
[operations/puppet@production] gerrit: add role on gerrit1001 and remove spare

https://gerrit.wikimedia.org/r/536357

Change 538127 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add gerrit-new.wikimedia.org for migration

https://gerrit.wikimedia.org/r/538127

Change 538128 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: set gerrit-new as name/IP for new gerrit server

https://gerrit.wikimedia.org/r/538128

Change 538127 merged by Dzahn:
[operations/dns@master] add gerrit-new.wikimedia.org for migration

https://gerrit.wikimedia.org/r/538127

Change 538128 merged by Dzahn:
[operations/puppet@production] gerrit: set gerrit-new as name/IP for new gerrit server

https://gerrit.wikimedia.org/r/538128