Page MenuHomePhabricator

Backups for GitLab
Open, MediumPublic

Description

Where can centralized backups for GitLab repositories be stored and what network access is required?

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+1 -1
operations/puppetproduction+49 -28
operations/puppetproduction+177 -0
operations/puppetproduction+1 -1
operations/puppetproduction+1 -3
operations/gitlab-ansiblemaster+4 -7
operations/puppetproduction+98 -28
operations/gitlab-ansiblemaster+3 -0
operations/gitlab-ansiblemaster+1 -1
operations/gitlab-ansiblemaster+2 -2
operations/gitlab-ansiblemaster+2 -2
operations/puppetproduction+8 -0
operations/puppetproduction+3 -0
operations/puppetproduction+16 -0
operations/puppetproduction+15 -1
operations/gitlab-ansiblemaster+18 -2
operations/puppetproduction+5 -0
operations/puppetproduction+23 -0
operations/gitlab-ansiblemaster+3 -1
operations/puppetproduction+1 -0
operations/puppetproduction+14 -3
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I think that is something that Jelto (our new SRE - starts June 7) can handle, i.e. add a second disk with the right dimensions.

In the mean time can we use the existing disk and keep only 2 backups at a time in order to not fill the root disk. My understanding is that bacula runs daily and copies new backups. Is that correct?

Yes, this is reasonable. These two variables need to be updated accordingly (and Ansible redeployed):

gitlab_backup_keep_time: "604800"
gitlab_backup_config_keep_num: "6"

My understanding is that bacula runs daily and copies new backups.

Bacula has not been setup at all. You should have a "latest" directory where you move things there instantly (or generate a tarball, as that won't have consistency issues when being copied) when a backup is done, or let a bacula prehook run the dumps before send in them for storage (the first is preferred).

Change 697844 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab: add Bacula backup::host class to role

https://gerrit.wikimedia.org/r/697844

There are 2 things needed to make Bacula backups work on a given host.

First is to add the "backup::host" puppet class to the host (but not directly on a hostname, we do this via the role class). This adds a Bacula agent which can talk to the Bacula server. I uploaded a change above doing that. This will be a needed requirement either way and can be merged anytime.

Secondly we need to add a "backup::set" to actually back up. That is a collection of file pathes. It can either be an existing one that Bacula already knows about, because they are standard pathes already used for backups on other hosts, then we can simply use that. Or it can be a new set of files/pathes that we create custom for the gitlab role. That is not a problem either, it just means one more small change in puppet on the Bacula server.

As an example: the role for APT repo servers uses: backup::set { 'srv-wikimedia': } and in the Bacula server role there is a definition which says this just means /srv/wikimedia, but could be multiple directories.

bacula::director::fileset { 'srv-wikimedia':
     includes => [ '/srv/wikimedia' ]
 }

So yea, we need to agree on the path in the file system to use and then we can merge a second change and then Bacula backups should start working.

gitlab is configured to store backups in /srv/gitlab-backup

grep backup_path /etc/gitlab/gitlab.rb
 
# The directory where Gitlab backups will be stored
gitlab_rails['backup_path'] = "/srv/gitlab-backup"

Change 697844 merged by Dzahn:

[operations/puppet@production] gitlab: add Bacula backup::host class to role

https://gerrit.wikimedia.org/r/697844

We now have the Bacula client (fd for file daemon) running on gitlab1001.

[gitlab1001:~] $ ps aux | grep bacula
root     11735  0.0  0.0  26140  7684 ?        Ssl  19:00   0:00 /usr/sbin/bacula-fd -fP -c /etc/bacula/bacula-fd.conf

Also it added the needed certificates to communicate with the Bacula server and opened a firewall hole.

Bacula::Client/File[/etc/bacula/bacula-fd.conf]

[gitlab1001:~] $ sudo iptables -L | grep bacula
ACCEPT     tcp  --  backup1001.eqiad.wmnet  anywhere             tcp dpt:bacula-fd

Change 697850 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] bacula/gitlab: add a backup::set for gitlab and use it

https://gerrit.wikimedia.org/r/697850

all interested, please take a look at the latest Gerrit link above and feel free to review/comment there.

or just let me know here.. that is a patch that will add /srv/gitlab-backup/ to Bacula backups and I can confirm there is /srv/gitlab-backup/db/database.sql.gz but for now that has size 0.

Is that the only path needed and let us know once it has actual data in it?

On a different note, I'd like to remind about a separate backup set for Gitlab configuration backup, which we discussed earlier (if memory doesn't fail me) : /etc/gitlab/config_backup

$ sudo ls -al /etc/gitlab/config_backup
total 72
drwx------ 2 root root  4096 Jun  3 15:26 .
drwxrwxr-x 4 root root  4096 Jun  3 17:53 ..
-rw------- 1 root root 30720 May 27 00:00 gitlab_config_1622073603_2021_05_27.tar
-rw------- 1 root root 30720 Jun  3 15:26 gitlab_config_1622733992_2021_06_03.tar

Alright, thanks! That's not even on a different note but exactly the kind of feedback needed. So we'll have to add /etc/gitlab/config_backup to our new backup::set. It would't be separate though. In which way are they supposed to be separate?

I added /etc/gitlab/config_backup to the suggested backup::set, so it would consist of /srv/gitlab-backup and /etc/gitlab/config_backup as part of a single set called "gitlab".

It may contain sensitive data, particularly /etc/gitlab/gitlab-secrets.json. If you guys believe a single backup set is safe enough for both general backup and configuration/secrets, then it's fine with us.

I think all backups are treated as if they contain secrets, it's not like they are made public. I am not aware of 2 levels of security when it comes to that.

But now I am wondering, Is the general backup not going to contain anything private? Could it theoretically be made public for anyone to download?

But now I am wondering, Is the general backup not going to contain anything private? Could it theoretically be made public for anyone to download?

Between database dumps and the potential for private repos, I don't think so.

Alright, so if we can't make public dumps then I don't see a difference between the 2 things, both contain secrets. right?

But also it would be nice to have public dumps for https://dumps.wikimedia.org/ without the private repos in them.

Are private repos something that is planned for the MVP stage though?

Are private repos something that is planned for the MVP stage though?

I don't personally have anything specific in mind right now, but:

  1. You can create private repos as an individual user, by default. (Thinking about it, I guess we probably only want that - if we want it - for trusted contributors.)
  2. Security are on the list of interested early adopters; they might have a private repo workflow in mind.
  1. You can create private repos as an individual user, by default. (Thinking about it, I guess we probably only want that - if we want it - for trusted contributors.)

Ah! Yea, I agree, we should probably first talk more on a list or wiki about whether we want that for all users by default and, separately, keep it disabled until we have implemented trusted contributors.

@Dzahn and I checked the backups on gitlab1001.

Backups are enabled again and there is one new backup for today. Currently the backup for one day has a size of 813MB.

ls -l --block-size=MB /srv/gitlab-backup/
total 813MB
-rw------- 1 git git   1MB Jun  3 15:26 1622733991_2021_06_03_13.11.3_gitlab_backup.tar
-rw------- 1 git git 813MB Jun  9 00:05 1623197124_2021_06_09_13.11.5_gitlab_backup.tar

The backup for GitLab config has been created as well, file sizes is around 1MB:

ls -l --block-size=MB /etc/gitlab/config_backup
total 1MB
-rw------- 1 root root 1MB Jun  9 00:00 gitlab_config_1623196802_2021_06_09.tar

The retention/keep time for backups is configured to: gitlab_rails['backup_keep_time'] = 604800 which is equal to 7 days.
So in total I would expect additional disk usage on GitLab and bacular of around 5.5GB (increasing with a broader adoption later).

@jcrespo is the estimated additional file system usage of around 5.5GB ok for the current bacular setup? I will make sure to check if GitLab handles the cleanup and rotation of backups in 8 days, to make sure we have no unwanted disk fill ups.

Furthermore I know that there were some discussions regarding the retention time/keep time for GitLab backups. Currently its set to 7 days. The file system usage could be reduced when the retention period is set to 3 days. In my opinion 3 days is quite a short time, for example when incidents were discovered after a weekend. So if the estimated file system usage is feasible for bacular, I would suggest to stay with 7 days retention period and rollout the change 697850

A bit of a context: the plan was to hold 7 days worth of backups locally, but in order to test local retention and its interaction with Bacula this number was to be be reduced to 3 days for the duration of the early deployment stage. @wkandek correct me if I'm wrong, please.

3 days duration confirmed as sufficient.

Change 699464 had a related patch set uploaded (by Brennen Bearnes; author: Brennen Bearnes):

[operations/gitlab-ansible@master] gitlab_backup_keep_time to 3 days

https://gerrit.wikimedia.org/r/699464

Change 699464 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] gitlab_backup_keep_time to 3 days

https://gerrit.wikimedia.org/r/699464

Let's call it Done once we have confirmed on the Bacula side that there is a full backup and it's restorable.

[backup1001:~] $ sudo bconsole
Connecting to Director backup1001.eqiad.wmnet:9101
1000 OK: 103 backup1001.eqiad.wmnet Version: 9.4.2 (04 February 2019)
Enter a period to cancel a command.
*restore
..
     5: Select the most recent backup for a client
..
    96: gitlab1001.wikimedia.org-fd
..
Select the Client (1-228): 96
The defined FileSet resources are:
Selection list for "FileSet" is empty!
No FileSet found for client "gitlab1001.wikimedia.org-fd".

^ not really available yet

cc: @Jelto @brennen @wkandek

@Dzahn I think https://gerrit.wikimedia.org/r/c/operations/puppet/+/697850 is not merged and deployed, so the fileset for GitLab doesn't exist.

I just added a+2 to 697850 .

However there is a already existing backup configuration in modules/gitlab/manifests/backup.pp. The class contains some open TODOs. From my understanding all the backup logic for GitLab is configured in ansible. So we should keep in mind to clean that up at some point when consolidating the ansible and puppet code.

Change 700084 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/gitlab-ansible@master] copy latest backup to dedicated folder

https://gerrit.wikimedia.org/r/700084

However there is a already existing backup configuration in modules/gitlab/manifests/backup.pp. The class contains some open TODOs.

I can explain this, I have created a gitlab module which is not used in production, but is running in a cloud project (ping me if you want access). The modules is a first pass at converting the ansible code into puppet code. I think its about 80-90% complete. I'm not sure what the future plans are here in relation to if we are integrating gitlab further into puppet or moving it out of puppet and into a k8s cluster, so for now i have mostly stalled on the production aspects of the code (for my needs i just needs a basic instance up to test api's); however happy to revive things or go through the code if its useful

Change 700183 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] bacula: Add new jobdefaults/schedule for Github, full backups every day

https://gerrit.wikimedia.org/r/700183

Change 700183 merged by Jcrespo:

[operations/puppet@production] bacula: Add new jobdefaults/schedule for Gitlab, full backups every day

https://gerrit.wikimedia.org/r/700183

Change 700351 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] bacula: Fix schedule and monitoring as a followup to 67ee5c0

https://gerrit.wikimedia.org/r/700351

Change 700351 merged by Jcrespo:

[operations/puppet@production] bacula: Fix schedule and monitoring as a followup to 67ee5c0

https://gerrit.wikimedia.org/r/700351

Change 700595 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab::backup: create backup paths with wmflib::dir::mkdir_p

https://gerrit.wikimedia.org/r/700595

Change 700601 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab: ensure backup dirs exist, add parameter for config backup

https://gerrit.wikimedia.org/r/700601

Change 700622 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] profile::gitlab: ensure backup dirs exist in production

https://gerrit.wikimedia.org/r/700622

Change 700084 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] copy latest backup to dedicated folder

https://gerrit.wikimedia.org/r/700084

Mentioned in SAL (#wikimedia-operations) [2021-06-21T18:26:29Z] <brennen> gitlab1001: running ansible for copying latest backup to dedicated folder (T274463)

Change 700851 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/gitlab-ansible@master] fix GitLab backup cronjob

https://gerrit.wikimedia.org/r/700851

I can explain this, I have created a gitlab module which is not used in production, but is running in a cloud project (ping me if you want access). The modules is a first pass at converting the ansible code into puppet code. I think its about 80-90% complete. I'm not sure what the future plans are here in relation to if we are integrating gitlab further into puppet or moving it out of puppet and into a k8s cluster, so for now i have mostly stalled on the production aspects of the code (for my needs i just needs a basic instance up to test api's); however happy to revive things or go through the code if its useful

@jbond thanks for the clarification! So currently the source of truth is Ansible.
However for migrating to puppet (https://phabricator.wikimedia.org/T283076) the code will be helpful. So I'm sure we will come back to your work there :)

Regarding the overall backup of GitLab, we decided to do full daily backups with Bacula. Some minor changes in the Ansible code and in Bacula were needed and are in review. This solution represents the actual application backup strategy a lot more and optimizes disk usage on Bacular. I documented the decision in the Decision_Log.

When the backup configuration is in place in GitLab and Bacular we can do a restore test.

Change 700622 merged by Dzahn:

[operations/puppet@production] profile::gitlab: ensure backup dirs exist in production

https://gerrit.wikimedia.org/r/700622

Change 700595 merged by Dzahn:

[operations/puppet@production] gitlab::backup: create backup paths with wmflib::dir::mkdir_p

https://gerrit.wikimedia.org/r/700595

Change 697850 merged by Dzahn:

[operations/puppet@production] bacula/gitlab: add a backup::set for gitlab and use it

https://gerrit.wikimedia.org/r/697850

On backup1001, we can now see our fileset in bconsole:

(sudo bconsole -> restore -> 5 -> 96)

Select the Client (1-228): 96
Automatically selected FileSet: gitlab
No Full backup before 2021-06-22 14:05:58 found.
[backup1001:~] $ echo "show job" | sudo bconsole | grep gitlab
Job: name=gitlab1001.wikimedia.org-Daily-production-gitlab JobType=66 level=Incremental Priority=10 Enabled=1
  --> Client: Name=gitlab1001.wikimedia.org-fd Enabled=1 Address=gitlab1001.wikimedia.org FDport=9102 MaxJobs=1 NumJobs=0
  --> FileSet: name=gitlab IgnoreFileSetChanges=0
      I /srv/gitlab-backup/latest
      I /etc/gitlab/config_backup/latest
  --> WriteBootstrap=/var/lib/bacula/gitlab1001.wikimedia.org-Daily-production-gitlab.bsr

All of the above is correct, but this is easier to type, I think:

# check_bacula.py gitlab1001.wikimedia.org-Daily-production-gitlab
2021-06-22 14:06:14: type: F, status: T, bytes: 452912

(F for "full backup level" and T for "successfuly terminated", it is strange but it is bacula terminology)

I can explain this, I have created a gitlab module which is not used in production, but is running in a cloud project (ping me if you want access). The modules is a first pass at converting the ansible code into puppet code. I think its about 80-90% complete. I'm not sure what the future plans are here in relation to if we are integrating gitlab further into puppet or moving it out of puppet and into a k8s cluster, so for now i have mostly stalled on the production aspects of the code (for my needs i just needs a basic instance up to test api's); however happy to revive things or go through the code if its useful

@jbond thanks for the clarification! So currently the source of truth is Ansible.

Yes mostly, puppet still runs on the gitlab server using profile::gitlab. however it shouldn't be managing things that are already managed by ansible.

However for migrating to puppet (https://phabricator.wikimedia.org/T283076) the code will be helpful. So I'm sure we will come back to your work there :)

Yes sure ping me when ever. currently the ansible playbook does three high level things and i have tried to map theses to puppet modules

This should hopefully make it smoother to migrate away from ansible as we can do it peice meal

Regarding the overall backup of GitLab, we decided to do full daily backups with Bacula. Some minor changes in the Ansible code and in Bacula were needed and are in review. This solution represents the actual application backup strategy a lot more and optimizes disk usage on Bacular. I documented the decision in the Decision_Log.

When the backup configuration is in place in GitLab and Bacular we can do a restore test.

Sounds good

When the backup configuration is in place in GitLab and Bacular we can do a restore test.

BTW this is ready, at least one from Bacula -> client, the rest would be the import, and that should be tested too :-). If I am not around, Dzahn or anyone else with root can also help with that.

I was able to test restore for the config backup:

root@gitlab1001:/var/tmp/bacula-restores/etc/gitlab/config_backup/latest# file latest.tar 
latest.tar: POSIX tar archive (GNU)

Change 700851 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] fix GitLab backup cronjob

https://gerrit.wikimedia.org/r/700851

Change 701068 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/gitlab-ansible@master] fix cleanup of config backups

https://gerrit.wikimedia.org/r/701068

Change 701068 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] fix cleanup of config backups, make script more robust

https://gerrit.wikimedia.org/r/701068

Following on a previous discussion, noting some concerns that current implementation (copying over the latest backup to a separate latest file) may cause. It's not a major problem, but definitely something that can be optimized.

  • there's information loss, backup timestamp in particular
  • added resource waste of duplicating backup files on disc, extra moving parts that can break.
  • backup rotation depends on archives' age, not a number of previous backups. If external backup doesn't happen for any reason for a few days, older archives will be deleted.

Second issue can be alleviated by creating a hard link to the latest backup file, or even a symlink, but the later needs to be verified with Bacula for support of dereferencing symlinks at backup time.

Change 710529 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/gitlab-ansible@master] remove backup warning for config backups

https://gerrit.wikimedia.org/r/710529

Change 710529 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] remove backup warning for config backups

https://gerrit.wikimedia.org/r/710529

Change 710676 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/gitlab-ansible@master] fix shell for backup cronjob

https://gerrit.wikimedia.org/r/710676

Change 710676 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] fix shell for backup cronjob

https://gerrit.wikimedia.org/r/710676

Change 712322 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab::backup move backup cronjobs to puppet

https://gerrit.wikimedia.org/r/712322

Excuse my jumping in on this ticket, but I have a query.
Have you considered using the built-in mechanism for uploading the backup tar file to object storage, immediately after it has been created?

https://docs.gitlab.com/ce/raketasks/backup_restore.html#uploading-backups-to-a-remote-cloud-storage

It seems to me that it might be suitable to use a private swift container for this, then set gitlab_rails['backup_upload_connection'] and gitlab_rails['backup_upload_remote_directory'] to use that swift container.

Then after every backup it would push the tar to swift immediately and keep a smaller number of backups on the local disk.
I don't know what retention policy we can set on the swift container to stop the amount of data increasing.
It's just a thought. Maybe you've already discounted it for a good reason.

Have you considered using the built-in mechanism for uploading the backup tar file to object storage, immediately after it has been created?

Maybe you've already discounted it for a good reason.

Services should adapt to the organization's existing workflows, not the other way around 0:-)- otherwise we will end up with 20 different workflows for backup and recovery, and no one maintaining them. Swift is an unsuitable recovery solution by itself, as it is focused on performant object storage, but lacks basic backup and recovery workflow features like strong client-side encryption, retention management, geographic redundancy, reliability, backup-specific monitoring, low storage footprint, emergency recovery tools and automatic recovery procedures- all of which would have to be implemented and handled as a special case on top of swift. Also current WMF Swift installation is completely designed around mediawiki needs, and lacks of proper workflows for non-mw storage, physical isolation and other musts (T279621). Bacula, while not perfect - and probably not the best fit for this specific kind of backups-, is already used for all of current backup production workflows successfully, including an automated and documented recovery workflow- only leaving databases (which also end up in bacula) and swift media backups (in minio, so we use a separate technology on purpose) with a separate workflow, mostly because of its volume, simplifying and uniforming maintenance and recovery- rather than having multiple of them and barely mainteined.

Please read about the context of the organization to know about how our backups work and what the data persistence team's role is:

While adding additional recovery workflows when needed is not out of the question, it shouldn't be the first option ever. I would be happy to answer additional questions if you have them :-).

Thanks @jcrespo - That's a really considered and informative reply.

brennen edited projects, added GitLab; removed GitLab (Initialization).

Change 719041 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/gitlab-ansible@master] remove backup crontab managed by Ansible

https://gerrit.wikimedia.org/r/719041

Change 712322 merged by Jelto:

[operations/puppet@production] gitlab::backup move backup cronjobs to puppet

https://gerrit.wikimedia.org/r/712322

Change 719041 merged by Jelto:

[operations/gitlab-ansible@master] remove backup crontab managed by Ansible

https://gerrit.wikimedia.org/r/719041

Change 719930 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab::backup remove deprication warning and deletion of config backup

https://gerrit.wikimedia.org/r/719930

Change 719930 merged by Jelto:

[operations/puppet@production] gitlab::backup remove deprication warning and deletion of config backup

https://gerrit.wikimedia.org/r/719930

Change 720316 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab::backup make config backup less verbose

https://gerrit.wikimedia.org/r/720316

Change 720316 abandoned by Jelto:

[operations/puppet@production] gitlab::backup make config backup less verbose

Reason:

change not needed

https://gerrit.wikimedia.org/r/720316

We used the installation of the restore script as a practical example in a Gerrit/code review session.

We went through puppet compiling, confirmed it is only installed in codfw and not in eqiad, how to amend patch sets to existing Gerrit changes and get them merged.

So the restore script in its current from is now installed on the codfw host. But of course nothing runs it automatically and it still needs to be run manually and tested.

Mentioned in SAL (#wikimedia-operations) [2021-10-28T20:37:53Z] <mutante> ensured gitlab restore timer is running only on passive server and re-enabled it - https://gerrit.wikimedia.org/r/c/operations/puppet/+/735437 T274463

We have spread the updates related to gitlab-backup-restore / sync across multiple tickets. also see T283076#7466960

So now we have a systemd unit/timer that restores data on the passive gitlab host that it gets synced over from the active host and it just does that all the time to keep the 2 hosts in sync. Congrats to @Arnoldokoth getting the restore-from-backup script to work from within systemd and puppetized.

Thanks for the implementation of the restore script and the timer!

When updating the documentation I had some additional thoughts about the restore script. Currently the restore script is only for the replica. It would be helpful to be able to use the restore script on the production machine as well. So in case of an urgent restore in production we can use the script and don't have to do all of the steps manually.

For that it would be nice to have some additional parameters. To restore the replica I could imagine something like:

/srv/gitlab-backup/gitlab-restore.sh --replica latest

--replica makes sure to keep the config file (so it doesn't get replaced by the production config file). It should be quite easy to implement. latest is the name of the backup we want to restore.

For production the restore could be something like:

/srv/gitlab-backup/gitlab-restore.sh 1635379585_2021_10_28_14.3.2

1635379585_2021_10_28_14.3.2 is the name of the backup we want to restore. Of course we could also use latest here, but I just thinking about the use case that we may want to restore to a earlier date.

Having the name of the backup as a parameters needs some additional logic and refactoring. Especially when using the latest backup. So I'm not sure if we want to add this much complexity to the script.

@Arnoldokoth and @Dzahn what do you think about this feature?

@Jelto This makes sense to me. We talked about it a bit during our 1:1 we just had. I would say let's split it multiple parts. As the very first step we change the parameter to _install_ the script (not run it) on the production machine. I mentioned this to Arnold as the lowest hanging fruit to change next. Then we add a parameter to change whether it keeps or replaces the config file. Not sure I would call it --replica that is not that clear to me. More like "replace_config true/false" ?.

After this, we can finally we add the option to restore something older than latest and take a time stamp. Remind me how many versions we keep before we rotate?

Makes sense to split this into two different changes. I think the name replace_config instead of --replica makes more sense.

Remind me how many versions we keep before we rotate?

Currently we keep backups for 72h on GitLab.

Change 737064 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] gitlab: accept backup file argument

https://gerrit.wikimedia.org/r/737064

For production the restore could be something like:

/srv/gitlab-backup/gitlab-restore.sh 1635379585_2021_10_28_14.3.2

1635379585_2021_10_28_14.3.2 is the name of the backup we want to restore. Of course we could also use latest here, but I just thinking about the use case that we may want to restore to a earlier date.

Having the name of the backup as a parameters needs some additional logic and refactoring. Especially when using the latest backup. So I'm not sure if we want to add this much complexity to the script.

@Jelto https://gerrit.wikimedia.org/r/c/operations/puppet/+/737064

Currently working on this. The script currently accepts the file using a flag -f like so: ./gitlab-restore-v2.sh -f 1635379585_2021_10_28_14.3.2. The commit is not complete with all the changes. Currently working on checking if the file passed actually exists on the server, installing the script and the second flag.

Change 737064 merged by AOkoth:

[operations/puppet@production] gitlab: accept backup file argument

https://gerrit.wikimedia.org/r/737064

Change 741675 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] gitlab: restore script keep_config options

https://gerrit.wikimedia.org/r/741675

Change 741675 merged by AOkoth:

[operations/puppet@production] gitlab: restore script keep_config options

https://gerrit.wikimedia.org/r/741675

Change 753680 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab::restore: restore after rsync of backup

https://gerrit.wikimedia.org/r/753680