Gerrit seems to be approaching the limit of its current hardware.
The CPU usage seems to be OK -- rarely over 50% -- I think when that happens it's usually trying to garbage collect memory rapidly and handle queued requests.
The Disk size seems fine as well: plenty of room and I rarely see any IO wait.
Memory is where the problem is for that machine.
At some point, the size of our repos exceeded the size of the heap we're able to allocate. We're at roughly 32GB of git repos on disk currently and we have 32GB of ram in the machine (20GB heap).
I think ideally we'd be able to fit all of the caches + indexes + a good portion of our git repos into memory at the same time as all the other gerrit objects. Caches don't seem to consume a large amount of memory (<1 GB; 1.1GB persisted to disk); we have 4GB set aside for packfiles exclusively (would be nice to up this if we had space); indexes are 2GB.
We run at a 95th%ile of 18GB of ram in use. The G1GC we're using keeps 10% headspace before triggering garbage collection (which explains the 18GB). We do; however, still manage to hit 20GB of ram in use occasionally. This is in-spite of doing weekly git gc.
I think it would be good to double the amount of ram for Gerrit to 64GB.
On the day:
- Rsync /srv/gerrit/git/ , /srv/gerrit/plugins and /var/lib/gerrit2/review_site/ from cobalt to gerrit1001
- Stop gerrit && disable puppet on gerrit1001
- Merge mariadb::ferm_misc: allow connections from gerrit1001
- Stop puppet on cobalt + gerrit2001
- Merge Gerrit: Switch master from cobalt to gerrit1001
- Merge Switch gerrit.wikimedia.org backend to gerrit1001
- Stop gerrit on cobalt
- repeat the rsync commands above (Rsync /var/lib/gerrit2/review_site to gerrit1001. Also rsync lfs objects again.)
- Rename /var/lib/gerrit2/review_site/data/javamelody/r_cobalt to /var/lib/gerrit2/review_site/data/javamelody/r_gerrit1001 on gerrit1001.
- Run puppet on gerrit1001 + cobalt
- Start gerrit on gerrit1001
- Hack DNS authdns-update to clone from gerrit-replica temporarily, deploy DNS change
- Manually copy apache2 site config for gerrit.wm.org with scp from cobalt to gerrit1001, restart apache
- Manually run command from list_mediawiki_extensions cron to create /var/www/mediawiki-extensions.txt
- Run the online reindexer
Topic branch (open and merged) https://gerrit.wikimedia.org/r/q/topic:%22gerrit1001%22+(status:open%20OR%20status:merged)
Migration date is October 21st 2019.