I noticed the gitlab-restore unit failed today.
I did not get to debug it much yet but the status is:
● backup-restore.service - GitLab Backup Restore Loaded: loaded (/lib/systemd/system/backup-restore.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Wed 2022-05-11 01:43:12 UTC; 47s ago Process: 4671 ExecStart=/mnt/gitlab-backup/gitlab-restore.sh (code=exited, status=1/FAILURE) Main PID: 4671 (code=exited, status=1/FAILURE) May 11 01:42:39 gitlab2001 gitlab-restore.sh[4671]: run: nginx: (pid 9123) 142099s; run: log: (pid 662) 3462960s May 11 01:42:39 gitlab2001 gitlab-restore.sh[4671]: run: postgres-exporter: (pid 9135) 142099s; run: log: (pid 661) 3462960s May 11 01:42:39 gitlab2001 gitlab-restore.sh[4671]: run: postgresql: (pid 16061) 172662s; run: log: (pid 682) 3462960s May 11 01:42:39 gitlab2001 gitlab-restore.sh[4671]: run: puma: (pid 9143) 142098s, want down; run: log: (pid 683) 3462960s May 11 01:42:39 gitlab2001 gitlab-restore.sh[4671]: run: redis: (pid 16077) 172661s; run: log: (pid 678) 3462960s May 11 01:42:39 gitlab2001 gitlab-restore.sh[4671]: run: redis-exporter: (pid 9149) 142098s; run: log: (pid 664) 3462960s May 11 01:42:39 gitlab2001 gitlab-restore.sh[4671]: run: sidekiq: (pid 9158) 142097s; run: log: (pid 660) 3462960s May 11 01:43:11 gitlab2001 gitlab-restore.sh[4671]: timeout: run: puma: (pid 9143) 142130s, want down May 11 01:43:12 gitlab2001 systemd[1]: backup-restore.service: Main process exited, code=exited, status=1/FAILURE May 11 01:43:12 gitlab2001 systemd[1]: backup-restore.service: Failed with result 'exit-code'.
and I have already been able to reproduce it a second time when manually starting it.
It's timing out trying to shutdown puma.
after: ├─5360 /opt/gitlab/embedded/bin/ruby /opt/gitlab/embedded/bin/omnibus-ctl gitlab /opt/gitlab/embedded/service/omnibus-ctl* stop puma
Manually running that command confirms it:
/opt/gitlab/embedded/bin/ruby /opt/gitlab/embedded/bin/omnibus-ctl gitlab /opt/gitlab/embedded/service/omnibus-ctl* stop puma timeout: run: puma: (pid 9143) 142431s, want down
Additionally I noticed:
/mnt/gitlab-backup/gitlab-restore.sh: line 14: [: 14.10.2: unary operator expected
On that line it is comparing $installed_version to $backup_version and the way it tries to detect those versions somehow fails.
I will look more tomorrow. It was already a bit late here. I just wanted to save the info though and share it.