Current backup issues
GitLab backups were configured in T274463. After some tweaks they work reliably now. However during the implementation it was clear that we will reach scaling limits with the current solution. With more adoption and usage of GitLab, the key problems are:
- backups take quite a lot of space ("more" in comparison to Gerrit)
- backups need even more space during creation
- backup creation takes quite a lot of time
- backups are done every 24h
So long-term plan is needed how we want to do backups for GitLab. This design should include how the backups are created, backup frequency, storage location, rotation and if possible also monitoring of backups. Furthermore we need some estimates regarding disk usage and backup and restore times for future backup sizes.
Potential solutions
Some ideas discussed in T274463 and other tasks which could influence the design:
- order hosts with bigger disks/get a second pair of disks (and tweak disk layout)
- Setup partial/incremental backups for GitLab T316935 / T324506
use alternative backup strategies for example bacula, rsync, pgbouncer/dump- Introduce a lot of complexity and are not officially supported by GitLabuse alternative backup storage locations like a dedicated s3 bucket/minio/cloud or a locally mounted share- Doesn't solve the problem of local disk space needed to create the backup files
Final backup design
Some experimentation and discussion happened to evaluate the different options. See T330172#8657895 and related changes. We agreed to add additional disks to the GitLab hosts to mitigate the disk space issue (for creation and storage of a single backup).
Backup storage
Two additional 1.7TB disks were added to all GitLab hosts (except gitlab2002, which will be done after switchover). This disks are configured as RAID1 and mounted at the default GitLab backup location /srv/gitlab-backup.
The additional disk space gives us plenty of room for future growth and migration to GitLab. Furthermore the root partition also has more space now. According to estimates in T330172#8657895 the space should last for at least two more years.
Filesystem Size Used Avail Use% Mounted on /dev/mapper/gitlab1004--vg-root 813G 79G 693G 11% / /dev/md1 1.8T 65G 1.6T 4% /srv/gitlab-backup
Backup frequency
Additional disks don't solve the issue of backup duration and frequency. The plan here is to implement incremental backups (T324506)/partial backups (T316935). This backups will contain repository data and database changes but no CI artifacts (builds, packages). The latter make up most of the space and backup runtime.
We already have a partial backup in the backup script(T316935). Furthermore some research is happening in T324506 around incremental backups. So this efforts should be consolidated. The goal is to to keep doing full backups every 24 hours but with incremental backups in between (maybe every 4 hours in the beginning).