Page MenuHomePhabricator

GitLab replica in codfw
Open, Needs TriagePublic

Description

A second replica of GitLab in codfw would help to validate the setup of production Gitlab gitlab1001 in eqiad. This replica should be passive and we want to use it to test backup and restore of gitlab1001 backups.

In the future this instance can also be used for failover and automated restore scenarios.

Event Timeline

https://gerrit.wikimedia.org/r/c/operations/puppet/+/702126 prevents that backups will be created twice. Bacula will only backup the active server, not the replica.

Change 707350 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/gitlab-ansible@master] add gitlab2001 to host_vars and variables

https://gerrit.wikimedia.org/r/707350

I modified and ran the install script using ansibles --check (dry-run) flag against gitlab2001. Looks good, there are two errors due to check mode usage.

I would like to roll out the ansible playbook on gitlab2001 without the --check flag to setup the machine completely.

Complete change: https://gerrit.wikimedia.org/r/c/operations/gitlab-ansible/+/707350

Output:

./install-gitlab-server.sh 

PLAY [Install Gitlab Server] ***************************************************************************************************************************************************************************

TASK [Gathering Facts] *********************************************************************************************************************************************************************************
ok: [gitlab-server-replica]

TASK [gitlab_server : Include OS-specific variables] ***************************************************************************************************************************************************
ok: [gitlab-server-replica]

TASK [gitlab_server : Check if GitLab configuration file already exists] *******************************************************************************************************************************
ok: [gitlab-server-replica]

TASK [gitlab_server : Check if GitLab is already installed] ********************************************************************************************************************************************
ok: [gitlab-server-replica]

TASK [gitlab_server : Gather package facts] ************************************************************************************************************************************************************
ok: [gitlab-server-replica]

TASK [gitlab_server : Install GitLab dependencies] *****************************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Install GitLab dependencies (Debian)] ********************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Check GitLab dependencies] *******************************************************************************************************************************************************
skipping: [gitlab-server-replica] => (item=openssh-server) 
skipping: [gitlab-server-replica] => (item=curl) 
skipping: [gitlab-server-replica] => (item=openssl) 
skipping: [gitlab-server-replica] => (item=tzdata) 

TASK [gitlab_server : Check GitLab dependencies (Debian)] **********************************************************************************************************************************************
skipping: [gitlab-server-replica] => (item=gnupg) 

TASK [gitlab_server : Download GitLab repository installation script] **********************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Install GitLab repository] *******************************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Define the Gitlab package name] **************************************************************************************************************************************************
ok: [gitlab-server-replica]

TASK [gitlab_server : Generate random password] ********************************************************************************************************************************************************
ok: [gitlab-server-replica]

TASK [gitlab_server : Install GitLab] ******************************************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Reconfigure GitLab (first run)] **************************************************************************************************************************************************
changed: [gitlab-server-replica]

TASK [gitlab_server : Create GitLab SSL configuration folder] ******************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Create self-signed certificate] **************************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Copy GitLab configuration file] **************************************************************************************************************************************************
changed: [gitlab-server-replica]

TASK [gitlab_server : Create GitLab crontab] ***********************************************************************************************************************************************************
changed: [gitlab-server-replica]

TASK [gitlab_server : Setup GitLab user] ***************************************************************************************************************************************************************
changed: [gitlab-server-replica]

TASK [gitlab_server : Install GitLab sshd - create config directory] ***********************************************************************************************************************************
changed: [gitlab-server-replica]

TASK [gitlab_server : Install GitLab sshd - create config file] ****************************************************************************************************************************************
changed: [gitlab-server-replica]

TASK [gitlab_server : Install GitLab sshd - create moduli file] ****************************************************************************************************************************************
fatal: [gitlab-server-replica]: FAILED! => {"changed": false, "msg": "Destination directory /etc/ssh-gitlab does not exist"}
...ignoring

TASK [gitlab_server : Install GitLab sshd - create host keys] ******************************************************************************************************************************************
changed: [gitlab-server-replica] => (item=/etc/ssh-gitlab/ssh_host_rsa_key)
changed: [gitlab-server-replica] => (item=/etc/ssh-gitlab/ssh_host_ecdsa_key)
changed: [gitlab-server-replica] => (item=/etc/ssh-gitlab/ssh_host_ed25519_key)

TASK [gitlab_server : Install GitLab sshd - create service] ********************************************************************************************************************************************
changed: [gitlab-server-replica]

TASK [gitlab_server : Install GitLab sshd - enable service] ********************************************************************************************************************************************
fatal: [gitlab-server-replica]: FAILED! => {"changed": false, "msg": "Could not find the requested service ssh-gitlab: host"}
...ignoring

TASK [gitlab_server : Remove GitLab sshd - stop service] ***********************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Remove GitLab sshd - remove config file] *****************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Remove GitLab sshd - remove config directory] ************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Remove GitLab sshd - remove service] *********************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Remove GitLab sshd - reload systemd] *********************************************************************************************************************************************
skipping: [gitlab-server-replica]

TASK [gitlab_server : Create bacula backup directory for GitLab data backup] ***************************************************************************************************************************
changed: [gitlab-server-replica]

TASK [gitlab_server : Create bacula backup directory for GitLab config backup] *************************************************************************************************************************
changed: [gitlab-server-replica]

RUNNING HANDLER [gitlab_server : restart gitlab] *******************************************************************************************************************************************************
skipping: [gitlab-server-replica]

PLAY RECAP *********************************************************************************************************************************************************************************************
gitlab-server-replica      : ok=19   changed=10   unreachable=0    failed=0    skipped=15   rescued=0    ignored=2

Change 707350 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] add gitlab2001 to host_vars and variables

https://gerrit.wikimedia.org/r/707350

Change 708275 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/gitlab-ansible@master] disable backup cronjobs for gitlab2001

https://gerrit.wikimedia.org/r/708275

Change 708275 merged by Brennen Bearnes:

[operations/gitlab-ansible@master] disable backup cronjobs for gitlab2001

https://gerrit.wikimedia.org/r/708275

Change 708767 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] hiera::role::common::acme_chief add gitlab-replica SNI

https://gerrit.wikimedia.org/r/708767

Change 708767 merged by Jelto:

[operations/puppet@production] hiera::role::common::acme_chief add gitlab-replica SNI

https://gerrit.wikimedia.org/r/708767

Change 709383 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] hiera::role::common::idp add gitlab-replica to production idp

https://gerrit.wikimedia.org/r/709383

Change 709383 merged by Jelto:

[operations/puppet@production] hiera::role::common::idp add gitlab-replica to production idp

https://gerrit.wikimedia.org/r/709383

I restored the backup of gitlab1001 to gitlab2001 using the restore instructions of S&F.
I enhanced the guide and moved it to wikitech: https://wikitech.wikimedia.org/wiki/GitLab/Backup_and_Restore#Restore

I also documented some first information about the replica in a dedicated page: https://wikitech.wikimedia.org/wiki/GitLab/Replica

The next step is to evaluate options to automate this process and to run the restore on gitlab-replica every 24h.

Change 710948 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] profile::gitlab rsync latest backup to passive host

https://gerrit.wikimedia.org/r/710948

Change 710948 merged by Jelto:

[operations/puppet@production] profile::gitlab rsync latest backup to passive host

https://gerrit.wikimedia.org/r/710948

Change 711348 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] profile::gitlab rsync fix rsync backup command

https://gerrit.wikimedia.org/r/711348

Change 711348 merged by Jelto:

[operations/puppet@production] profile::gitlab rsync fix rsync backup command

https://gerrit.wikimedia.org/r/711348

Change 713635 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] profile::gitlab load rsync::server only on passive GitLab

https://gerrit.wikimedia.org/r/713635

Change 713635 merged by Jelto:

[operations/puppet@production] profile::gitlab load rsync::server only on passive GitLab

https://gerrit.wikimedia.org/r/713635

@Arnoldokoth some thoughts on the wrong settings on the replica after restore:

When we import the dump of production GitLab to the repica, the setting for home_page_url doesn't match the replica. So redirects and a click on the top left GitLab icon result in redirects to production GitLab. Which may confuse people especially when doing tests on the replica. So we have to find a way to adjust this setting.

One solution would be to use gitlab-settings and run the Python script to adjust all settings after doing the restore. This would mean we have to clone the script and create API keys for access. For me this sound like a lot of moving parts.

A simple solution would be to set the home_page_url using the gitlab-rails console. The console is available on the replica and doesn't require additional credentials (apart from being root).

So for the first version of the restore script we could add the following line to make sure the home_page_url matches the replica:

echo "ApplicationSetting.last.update(home_page_url: 'https://gitlab-replica.wikimedia.org/explore')" | gitlab-rails console

This solution is not optimal. As soon as we also change other settings on production GitLab we have to add them as well. So we might need to use gitlab-settings at some point. But for now I'm fine with just setting this single value in the console.

@Jelto I've taken note of this and will add it to the script. Thank you.

Change 725340 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] gitlab: install backup restore script

https://gerrit.wikimedia.org/r/725340

Change 725340 merged by Dzahn:

[operations/puppet@production] gitlab: install backup restore script

https://gerrit.wikimedia.org/r/725340

Change 730605 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] gitlab: add gitlab restore systemd timer

https://gerrit.wikimedia.org/r/730605

Change 730605 merged by Dzahn:

[operations/puppet@production] gitlab: add gitlab restore systemd timer

https://gerrit.wikimedia.org/r/730605

Mentioned in SAL (#wikimedia-operations) [2021-10-13T19:16:00Z] <mutante> gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( T285867)

Change 730630 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] gitlab: remove unnecessary comments from restore script

https://gerrit.wikimedia.org/r/730630

Change 730632 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab: deactivate new backup-restore unit for now

https://gerrit.wikimedia.org/r/730632

Change 730632 merged by Dzahn:

[operations/puppet@production] gitlab: deactivate new backup-restore unit for now

https://gerrit.wikimedia.org/r/730632

Change 730630 merged by Dzahn:

[operations/puppet@production] gitlab: remove unnecessary comments from restore script

https://gerrit.wikimedia.org/r/730630

Change 730641 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab: allow installing the restore script while NOT enabling the timer

https://gerrit.wikimedia.org/r/730641

Change 730641 merged by Dzahn:

[operations/puppet@production] gitlab: allow installing the restore script while NOT enabling the timer

https://gerrit.wikimedia.org/r/730641

Change 731154 had a related patch set uploaded (by AOkoth; author: AOkoth):

[operations/puppet@production] gitlab: redirect out to logfile in restore script

https://gerrit.wikimedia.org/r/731154

Change 731154 merged by Dzahn:

[operations/puppet@production] gitlab: redirect out to logfile in restore script

https://gerrit.wikimedia.org/r/731154

Change 731155 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] gitlab: re-enable timer for backup-restore script

https://gerrit.wikimedia.org/r/731155

Change 731155 merged by Dzahn:

[operations/puppet@production] gitlab: re-enable timer for backup-restore script

https://gerrit.wikimedia.org/r/731155