As discussed we want to have some kind of repository mirror for production GitLab which has a lower delay than the replica (currently 24h). I see multiple options to implement this.
Restore replica more often
Currently the GitLab replica is synced with production GitLab every 24h. We could increase the backup and restore frequency to something like once every 2h. I don't see any technical constrains here. The resource usage on production GitLab looks fine while doing backups.
We cloud also think about introducing partial backup and restore. So we could do a full backup and restore cycle on the replica every 24hs. For partial backups we would skip certain folders like the database and only sync the repositories.
Sync git files to replica/mirror machine
We could also try to just sync the repositories using rsync. So we could setup a timer to sync /var/opt/gitlab/git-data/repositories/ to the replica or another mirror machine.
However I don't know what happens to the gitaly server if we update the repositories on disk. The documentation discourages this.
Mirror repositories using repository mirroring
Repository mirroring would be GitLabs build-in feature to create mirrors. We could mirror repositories to:
- GitLab.com
- replica(s)
- Github.com
- a dedicated mirror machine
Mirrors have to be configured per project. So we would have to create some kind of script or hook which sets up the mirror. Mirrors can be updated via api. Mirrors also support push-updates. So the mirror would be in sync with production GitLab.
I assume to implement mirroring we have to write some code to add mirroring to all projects. If we want to mirror to gitlab.com we have to think about what projects we want to mirror (only "our" official projects or user projects as well?). If we want to mirror to the replica we have to make sure to disable mirroring when we switch or fallback to the replica.
Use gitaly in cluster mode
Another option would be to use gitalys cluster feature. It is possible to run multiple instances of the git server and replicate the data. However this is quite an advanced feature. We would have to run a dedicated gitlay server and would start to separate some components out of the omnibus package.
Conclusion
For the current architecture and usage I would like to use the first solution. Most of the code is in place and we just have to increase the backup and restore frequency. We also wouldn't need a additional mirror machine or setup additional hosts.