Page MenuHomePhabricator

Rename of a repository is not replicated
Open, Needs TriagePublic

Description

When renaming operations/debs/wmf-sre-laptop to operations/debs/wmf-laptop (T365985), the plugin states its replicating the rename:

Replicating the rename of operations/debs/wmf-sre-laptop to operations/debs/wmf-laptop: 100%

Primary had it renamed:

$ ssh gerrit1003.wikimedia.org ls -ld /srv/gerrit/git/operations/debs/*laptop*
drwxr-xr-x 7 gerrit2 gerrit2 4096 Jun 28 02:19 /srv/gerrit/git/operations/debs/wmf-laptop.git

The replicas on gerrit2002 did not:

$ ssh gerrit2002.wikimedia.org ls -ld /srv/gerrit/git/operations/debs/*laptop*
drwxr-xr-x 7 gerrit2 gerrit2 4096 Jun 28 02:19 /srv/gerrit/git/operations/debs/wmf-sre-laptop.git

Nothing shows up in /var/log/gerrit/replication_log.

Event Timeline

The rename-project plugin needs some further configuration.

An overview is: https://gerrit.wikimedia.org/r/plugins/rename-project/Documentation/index.html#replication-of-project-renaming
And there is the configuration documentation at https://gerrit.wikimedia.org/r/plugins/rename-project/Documentation/config.md

The primary Gerrit would issue the rename project command to the other replicas. Using either:

  • ssh to the replicas Gerrit daemon
  • https to the replicas

I have an issue that port 29418 is only allowed to the service IP (the one pointed to by gerrit.wikimedia.org and gerrit-replica.wikimedia.org):

# ssh from users to gerrit
firewall::service { 'gerrit_ssh_users':
    proto  => 'tcp',
    port   => 29418,
    drange => [$ipv4, $ipv6],
}

We could use those but then some replicas might not have the service (that is the case of gerrit2003) and its internal traffic anyway. On the replica we thus need to allow traffic having for source (srange) the primary Gerrit (gerrit1003) and for destination (drange) the replica host IP (gerrit2002, gerrit2003).

In Puppet we have something similar via profile::gerrit::lfs_sync_dest:

profile::gerrit::lfs_sync_dest:
    # - 'gerrit1003.wikimedia.org'
    - 'gerrit2002.wikimedia.org'
    - 'gerrit2003.wikimedia.org'

And we have:

profile::gerrit::active_host: 'gerrit1003.wikimedia.org'
profile::gerrit::replica_hosts:
    - 'gerrit-replica.wikimedia.org'

The later uses the public service hostname, but we would need the internal hostnames.

Once figured out, in gerrit.config we should end up with:

[plugin "rename-project"]
    url = ssh://gerrit2@gerrit2002.wikimedia.org:29418
    url = ssh://gerrit2@gerrit2003.wikimedia.org:29418

The plugin on the primary Gerrit would then ssh to each host the command rename-project previous/name new/name and we will be set.

Easiest: we can allow port 29418 when the source is the active host (gerrit1003).

Change #1165832 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: config replicas for rename-project plugin

https://gerrit.wikimedia.org/r/1165832

Change #1165832 merged by Dzahn:

[operations/puppet@production] gerrit: config replicas for rename-project plugin

https://gerrit.wikimedia.org/r/1165832

Easiest: we can allow port 29418 when the source is the active host (gerrit1003).

All gerrit servers now have the new firewall rule to allow this.

root@gerrit2002:/etc/nftables/input# cat 10_gerrit_ssh_primary_to_replica_daemon.nft 
# Managed by puppet
# 
ip saddr { 208.80.154.135 } tcp dport { 29418 } accept
ip6 saddr { 2620:0:861:2:208:80:154:135 } tcp dport { 29418 } accept

Change #1175114 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: add daemons ssh host key to known_hosts

https://gerrit.wikimedia.org/r/1175114

After having added the Gerrit daemons host key to the known hosts (https://gerrit.wikimedia.org/r/1175114), I went to try a rename again:

$ ssh -p 29418 hashar@gerrit.wikimedia.org  rename-project test/newname test/newname2
Checking preconditions:                          100%
Retrieving the list of changes from DB:          100%
Renaming git repository:                         100%
Updating changes in the database:                100%
Indexing changes:                                100%
Replicating the rename of test/newname to test/newname2: 100%

It failed, from error_log:

[2025-08-01T13:23:46.214Z] [SSH rename-project test/newname test/newname2 (hashar)] INFO  com.googlesource.gerrit.plugins.renameproject.RenameProject : Rescheduling a rename replication for retry for ssh://gerrit@gerrit2003.wikimedia.org:29418 on project test/newname
[2025-08-01T13:23:46.801Z] [SSH rename-project test/newname test/newname2 (hashar)] INFO  com.googlesource.gerrit.plugins.renameproject.RenameProject : Rescheduling a rename replication for retry for ssh://gerrit2@gerrit2002.wikimedia.org:29418 on project test/newname
[2025-08-01T13:23:47.124Z] [SSH rename-project test/newname test/newname2 (hashar)] INFO  com.googlesource.gerrit.plugins.renameproject.RenameProject : Rescheduling a rename replication for retry for ssh://gerrit@gerrit2003.wikimedia.org:29418 on project test/newname
[2025-08-01T13:23:47.520Z] [SSH rename-project test/newname test/newname2 (hashar)] INFO  com.googlesource.gerrit.plugins.renameproject.RenameProject : Rescheduling a rename replication for retry for ssh://gerrit2@gerrit2002.wikimedia.org:29418 on project test/newname
[2025-08-01T13:23:47.843Z] [SSH rename-project test/newname test/newname2 (hashar)] INFO  com.googlesource.gerrit.plugins.renameproject.RenameProject : Rescheduling a rename replication for retry for ssh://gerrit@gerrit2003.wikimedia.org:29418 on project test/newname
[2025-08-01T13:23:48.239Z] [SSH rename-project test/newname test/newname2 (hashar)] INFO  com.googlesource.gerrit.plugins.renameproject.RenameProject : Rescheduling a rename replication for retry for ssh://gerrit2@gerrit2002.wikimedia.org:29418 on project test/newname
[2025-08-01T13:23:48.239Z] [SSH rename-project test/newname test/newname2 (hashar)] ERROR com.googlesource.gerrit.plugins.renameproject.RenameProject : Failed to replicate the renaming of test/newname to test/newname2 on ssh://gerrit@gerrit2003.wikimedia.org:29418 during [ssh://gerrit@gerrit2003.wikimedia.org:29418, ssh://gerrit2@gerrit2002.wikimedia.org:29418] attempts
[2025-08-01T13:23:48.239Z] [SSH rename-project test/newname test/newname2 (hashar)] ERROR com.googlesource.gerrit.plugins.renameproject.RenameProject : Failed to replicate the renaming of test/newname to test/newname2 on ssh://gerrit2@gerrit2002.wikimedia.org:29418 during [ssh://gerrit@gerrit2003.wikimedia.org:29418, ssh://gerrit2@gerrit2002.wikimedia.org:29418] attempts

So it is rescheduling for both hosts up to 3 time and log a failure for each of them.

Then in rename_log:

[2025-08-01 13:23:48,240 +0000] INFO 24 hashar OK test/newname {"name":"test/newname2","continue_with_rename":false}

The repository got renamed on gerrit1003 (primary) and on gerrit2002. It has not been renamed on gerrit2003 though, in its sshd log I have:

[2025-08-01T13:23:47.857Z] cc4e61f9 [SSHD] gerrit - AUTH FAILURE FROM 208.80.154.135 - - - user-not-found - - - -

Change #1175122 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: replica renames as "gerrit2" application user

https://gerrit.wikimedia.org/r/1175122

Change #1175122 merged by Dzahn:

[operations/puppet@production] gerrit: replicate repo renames as "gerrit2" application user

https://gerrit.wikimedia.org/r/1175122

Change #1175114 abandoned by Hashar:

[operations/puppet@production] gerrit: add daemons ssh host key to known_hosts

Reason:

I'll restore when I resume work to rename repositories.

https://gerrit.wikimedia.org/r/1175114