Page MenuHomePhabricator

Unexpected Gitlab restore timer running on gitlab-prod-1002.devtools.eqiad1.wikimedia.cloud
Closed, ResolvedPublic

Description

backup-restore.timer is triggering on a daily basis on gitlab-prod-1002.devtools.eqiad1.wikimedia.cloud:

# journalctl -u backup-restore.timer -e | tail
-- Boot 8d43041f07294c40961ba422ae532c29 --
May 31 04:01:44 gitlab-prod-1002 systemd[1]: Started Periodic execution of backup-restore.service.
Jun 01 04:01:34 gitlab-prod-1002 systemd[1]: backup-restore.timer: Succeeded.
Jun 01 04:01:34 gitlab-prod-1002 systemd[1]: Stopped Periodic execution of backup-restore.service.
-- Boot fce4b44705114713974bde3a6bd2ef3b --
Jun 01 04:01:45 gitlab-prod-1002 systemd[1]: Started Periodic execution of backup-restore.service.
Jun 02 04:01:32 gitlab-prod-1002 systemd[1]: backup-restore.timer: Succeeded.
Jun 02 04:01:32 gitlab-prod-1002 systemd[1]: Stopped Periodic execution of backup-restore.service.
-- Boot a377560eed84430fa3362f2faf248fa0 --
Jun 02 04:01:42 gitlab-prod-1002 systemd[1]: Started Periodic execution of backup-restore.service.

The backup-restore timer is supposed to only run on gitlab replicas (of which there are none in the devtools project).

Event Timeline

The daily restore will happen if the host's fqdn does not match the value of $active_host in operations/puppet/modules/profile/manifests/gitlab.pp. $active_host comes from hiera value profile::gitlab::active_host. operations/puppet/hieradata/cloud/eqiad1/devtools/common.yaml has

profile::gitlab::active_host: 'gitlab2002.wikimedia.org'

which is naming a host not in the devtools space. Seems like the value should be gitlab-prod-1002.devtools.eqiad1.wikimedia.cloud?

Change 926544 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] Fix profile::gitlab::active_host and profile::gitlab::passive_hosts for devtools

https://gerrit.wikimedia.org/r/926544

Change 926544 merged by Jelto:

[operations/puppet@production] Fix profile::gitlab::active_host and profile::gitlab::passive_hosts for devtools

https://gerrit.wikimedia.org/r/926544

Jelto claimed this task.
Jelto triaged this task as Medium priority.

Thanks for opening the change. Having the devtools instances as a replica/passive instance with restore enabled makes little sense. I think initially we disabled backups here because of disk space constraints. But let's see how it behaves. Devtools does not hold a lot of data.

I deployed the change above and restore job is gone now. Instead we have backups enabled on the devtools instance now. Other changes look fine for me as well.

I'll close this change, feel free to re-open if you notice anything wrong.

gitlab-prod-1002:~$ systemctl status backup-restore
Unit backup-restore.service could not be found.