Page MenuHomePhabricator

Setup partial backups for GitLab
Closed, ResolvedPublic

Description

Currently only full and config backups are done every 24h. It is possible to backup only certain data and skip some data during backup creation. All data which can be selected/skipped is:

db (database)
uploads (attachments)
builds (CI job output logs)
artifacts (CI job artifacts)
lfs (LFS objects)
terraform_state (Terraform states)
registry (Container Registry images)
pages (Pages content)
repositories (Git repositories data)
packages (Packages)

See also: https://docs.gitlab.com/ee/raketasks/backup_gitlab.html#excluding-specific-directories-from-the-backup

We should evaluate if we want to leverage this to also do partial backups for GitLab for certain data sources which change more frequent. Partial backups could improve disk usage and reduce the time the replicas are behind production.
The gitlab-backup.sh script offers a partial backup. This option currently skips SKIP=uploads,builds,artifacts,lfs,registry,pages. It should be evaluated if that makes sense for our use case.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Jelto triaged this task as Medium priority.Sep 2 2022, 1:35 PM
Jelto updated the task description. (Show Details)

I've done a benchmark of backing up the different datasources on gitlab1003:

datasourcedurationsize
empty0m15.656s10kb
db (database)0m24.962s95mb
uploads (attachments)0m16.710s27mb
builds (CI job output logs)0m15.448s10kb (?)
artifacts (CI job artifacts)0m15.473s500mb
lfs (LFS objects):0m45.351s500mb
terraform_state (Terraform states)0m16.376s20mb
registry (Container Registry images)0m15.931s10kb (nothing stored here)
pages (Pages content)0m15.458s10kb (nothing stored here)
repositories (Git repositories data)2m8.296s8gb
packages (Packages)37m44.770s60gb

So the most time and space is spend backing up "packages" (everything in the package registry).

I've done a test with excluding packages only:

datasourcedurationsize
all except packages2m58.019s8.5G

So taking a backup of everything except packages takes three minute and needs 8.5gb size. This is reasonable to do more frequent than 24 hours.

Change 912791 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: enable and run partial backups daily

https://gerrit.wikimedia.org/r/912791

Change 912791 merged by Jelto:

[operations/puppet@production] gitlab: enable and run partial backups daily

https://gerrit.wikimedia.org/r/912791

Change 918427 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: run backup sync and restore twice daily

https://gerrit.wikimedia.org/r/918427

Change 918427 merged by Jelto:

[operations/puppet@production] gitlab: run backup sync and restore twice daily

https://gerrit.wikimedia.org/r/918427

With the changes above production GitLab takes two backups now, one full backup and one incremental backup excluding "packages".

The backup is transferred twice to the replicas. So the lag of the replicas should is reduced from 24h to 12h.

I'll do some more research on how to sync the three jobs (backup, sync and restore) better to make sure we can execute this in any frequency without problems.

Change 927139 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: run four backups per day

https://gerrit.wikimedia.org/r/927139

Change 927139 merged by Jelto:

[operations/puppet@production] gitlab: run four backups per day

https://gerrit.wikimedia.org/r/927139

We are doing backups every 6h and restores every 12h now.

We noticed backups and restore interfere with maintenance so we need some kind of locking, which is tracked in T338332. Until that is solved, we stay on 6h/12h schedule or may reduce backups to every 12h again.

The 6h/12h schedule for backups and restores is the most reasonable approach. If we want to increase the frequency even more we will hit limits with the storage of packages and artifacts. Object storage (T336234) could help reduce the time for each backup and restore significantly. So for now we will keep the following schedule:

  • full backup every 24 hours
  • incremental backup every 6 hours
  • restore every 12 hours

I'll close the task.