GitLab will be switched during March 2023 Datacenter Switchover (T327920) from eqiad to codfw too (one day before the actual switchover, to not block dependencies). This task tacks the dry tun to failover the replicas from codfw (gitlab2002) to eqiad (gitlab1003).
Docs: https://wikitech.wikimedia.org/wiki/GitLab/Failover
Last task: T307142#7969993 (checklist can be adapted for this years failover)
Time: TBD, somewhere next week
Checklist: (WIP)
Preparations before downtime:
- check gitlab2001 and gitlab1003 use the same ssh host keys for ssh-gitlab daemon
- prepare change to set profile::gitlab::service_name: 'gitlab-replica.wikimedia.org' on gitlab1003 /operations/puppet/+/890779/
- Prepare change to point DNS entry for gitlab-replica.wikimedia.org to gitlab1003 gitlab-replica-old.wikimedia.org to gitlab2002 operations/dns/+/890785
-
configure gitlab1004 as profile::gitlab::active_hostnot needed on replica- replicas only: rsync should be allowed between replicas /operations/puppet/+/890434
- apply gitlab-settings to gitlab1003 and gitlab2002
-
announce downtime some days ahead on ops/releng list?not needed on replica
Scheduled downtime:
-
Announce downtime in #wikimedia-gitlabnot needed on replica -
pause all GitLab Runnersnot needed on replica - downtime gitlab2002 sudo cookbook sre.hosts.downtime -r "Running failover to gitlab1003 - T329930" -M 60
- stop puppet on gitlab2002 with sudo disable-puppet "Running failover to gitlab1003 - T329930"
- stop GitLab on gitlab2002 with gitlab-ctl stop nginx
- stop ssh-gitlab daemon on gitlab2002 with systemctl stop ssh-gitlab
- create full backup on gitlab2002 with /usr/bin/gitlab-backup create CRON=1 STRATEGY=copy GZIP_RSYNCABLE="true" GITLAB_BACKUP_MAX_CONCURRENCY="4" GITLAB_BACKUP_MAX_STORAGE_CONCURRENCY="1"
- sync backup, on gitlab2002 run /usr/bin/rsync -avp /srv/gitlab-backup/ rsync://gitlab1003.wikimedia.org/data-backup
- merge change to set profile::gitlab::service_name: 'gitlab-replica.wikimedia.org' on gitlab1003 /operations/puppet/+/890779/ and run puppet
- trigger restore on gitlab1003 run sudo systemctl start backup-restore.service (for logs, run journalctl -f -u backup-restore.service)
- Merge change to point DNS entry for gitlab-replica.wikimedia.org to gitlab1003 gitlab-replica-old.wikimedia.org to gitlab2002 operations/dns/+/890785
- verify installation
- enable puppet on gitlab2002 with sudo run-puppet-agent -e "Running failover to gitlab1003 - T329930"
- start ssh-gitlab daemon on gitlab2002 with systemctl stop ssh-gitlab
-
unpause all GitLab Runnersnot needed on replica -
announce end of downtimenot needed on replica
Once the switchover was successful, we will proceed with GitLab production switchover in T329931.