Gerrit seems to be approaching the limit of its current hardware.
The CPU usage seems to be OK -- rarely over 50% -- I think when that happens it's usually trying to garbage collect memory rapidly and handle queued requests.
The Disk size seems fine as well: plenty of room and I rarely see any IO wait.
Memory is where the problem is for that machine.
At some point, the size of our repos exceeded the size of the heap we're able to allocate. We're at roughly 32GB of git repos on disk currently and we have 32GB of ram in the machine (20GB heap).
I think ideally we'd be able to fit all of the caches + indexes + a good portion of our git repos into memory at the same time as all the other gerrit objects. Caches don't seem to consume a large amount of memory (<1 GB; 1.1GB persisted to disk); we have 4GB set aside for packfiles exclusively (would be nice to up this if we had space); indexes are 2GB.
We run at a 95th%ile of 18GB of ram in use. The G1GC we're using keeps 10% headspace before triggering garbage collection (which explains the 18GB). We do; however, still manage to hit 20GB of ram in use occasionally. This is in-spite of doing weekly `git gc`.
I think it would be good to double the amount of ram for Gerrit to 64GB.
**On the day:**
[] Stop gerrit on cobalt && disable puppet.
[] Rsync repos again to gerrit1001 to ensure everything is updated (also rsync indexes (/var/lib/gerrit2/review_site/indexes), and also rsync lfs objects again (/srv/gerrit/plugins/lfs)).
[] Merge [[ https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/535966/ | mariadb::ferm_misc: allow connections from gerrit1001 ]]
[] Merge [[ https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/541110/ | Gerrit: Switch master from cobalt to gerrit1001 ]]
[] Run puppet.
[] Merge [[ https://gerrit.wikimedia.org/r/#/c/operations/dns/+/541111/ | Switch gerrit.wikimedia.org backend to gerrit1001 ]]
[] Start gerrit
[] Run the online reindexer.
---
Topic branch (open and merged) https://gerrit.wikimedia.org/r/q/topic:%22gerrit1001%22+(status:open%20OR%20status:merged)
---
Migration date is October 21st 2019.
https://lists.wikimedia.org/pipermail/wikitech-l/2019-October/092664.html