Page MenuHomePhabricator

Gitlab-Github mirror broken
Open, Needs TriagePublic

Description

A mirror from Gitlab to Github was setup for mwcli in T293584: [mwcli] Setup github mirror

It seems this has been broken for some time.

It should be fixed!
CC @jeena who did the wizardry last time

Gitlab repo is https://gitlab.wikimedia.org/repos/releng/cli
The github repo I belive is currently https://github.com/wikimedia/mediawiki-tools-cli ? (perhaps needs a rename too)

Event Timeline

Looks like it is set up in the gitlab settings for the repo...

image.png (218×926 px, 14 KB)

13:get remote references: create git ls-remote: exit status 128, stderr: "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nIT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!\r\nSomeone could be eavesdropping on you right now (man-in-the-middle attack)!\r\nIt is also possible that a host key has just been changed.\r\nThe fingerprint for the RSA key sent by the remote host is\nSHA256:uNiVztksCsDhcc0u9e8BujQXVUpKZIDTMczCvj3tD2s.\r\nPlease contact your system administrator.\r\nAdd correct host key in /tmp/gitaly-ssh-invocation2616940952/known-hosts to get rid of this message.\r\nOffending RSA key in /tmp/gitaly-ssh-invocation2616940952/known-hosts:1\r\n remove with:\r\n ssh-keygen -f \"/tmp/gitaly-ssh-invocation2616940952/known-hosts\" -R \"github.com\"\r\nRSA host key for github.com has changed and you have requested strict checking.\r\nHost key verification failed.\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n".

Tried deleting and recreating. I'm unclear on what pubkey this relies or where it's set...

Curious, how are people using these mirrors? They are just readonly on github, right? Is there any difference in that case to just using our own Gitlab when there is no submitting anyways?

Recreating the mirror and clicking "detect host keys" resolves that part of the issue. Now get a permission denied.

It seems this has been broken for some time.

The RSA host key change only happened yesterday. The same thing was affecting Gerrit at T332972 which is fixed.

So I think brennen also fixed everything related to the key change but the "Now get a permission denied" is an unrelated error and probably the reason why it's been broken for some time as reported.

Note: https://github.com/wikimedia/mediawiki-tools-cli is a mirror of the Gerrit repository https://gerrit.wikimedia.org/r/admin/repos/mediawiki/tools/cli

And it is indeed replicated:

gerrit1001:/var/log/gerrit/replication_log
[2023-03-24 05:42:12,197] Replication to git@github.com:wikimedia/mediawiki-tools-cli started... [CONTEXT pushOneId="3ed2eea4" ]
[2023-03-24 05:42:12,519] Push to git@github.com:wikimedia/mediawiki-tools-cli references: RemoteRefUpdate{refSpec=refs/heads/main:refs/heads/main, status=NOT_ATTEMPTED, id=(null)..AnyObjectId[e1fd6968eb15285bc159b7176b9aa54b4e4458d1], force=yes, delete=no, ffwd=no} [CONTEXT pushOneId="3ed2eea4" ]
[2023-03-24 05:42:13,105] Replication to git@github.com:wikimedia/mediawiki-tools-cli completed in 907ms, 809789ms delay, 7 retries [CONTEXT pushOneId="3ed2eea4" ]
Dzahn renamed this task from Github mirror broken to Gitlab-Github mirror broken.Mar 24 2023, 11:19 PM

renaming ticket to make clear this is about "gitlab-github" and the other is about "gerrit-github".

Note: https://github.com/wikimedia/mediawiki-tools-cli is a mirror of the Gerrit repository https://gerrit.wikimedia.org/r/admin/repos/mediawiki/tools/cli

And it is indeed replicated:

gerrit1001:/var/log/gerrit/replication_log
[2023-03-24 05:42:12,197] Replication to git@github.com:wikimedia/mediawiki-tools-cli started... [CONTEXT pushOneId="3ed2eea4" ]
[2023-03-24 05:42:12,519] Push to git@github.com:wikimedia/mediawiki-tools-cli references: RemoteRefUpdate{refSpec=refs/heads/main:refs/heads/main, status=NOT_ATTEMPTED, id=(null)..AnyObjectId[e1fd6968eb15285bc159b7176b9aa54b4e4458d1], force=yes, delete=no, ffwd=no} [CONTEXT pushOneId="3ed2eea4" ]
[2023-03-24 05:42:13,105] Replication to git@github.com:wikimedia/mediawiki-tools-cli completed in 907ms, 809789ms delay, 7 retries [CONTEXT pushOneId="3ed2eea4" ]

Does this mean it is both replicated from Gerrit and from Gitlab?
If so I guess Gerrit replication should be disabled!

Curious, how are people using these mirrors? They are just readonly on github, right? Is there any difference in that case to just using our own Gitlab when there is no submitting anyways?

Mainly to increase discoverability

Does this mean it is both replicated from Gerrit and from Gitlab?

I imagine yes, the replications are probably in a race condition of some sort. Although Gerrit is not going to replicate anything since the Gerrit repo does not have any activity (it is marked Read-Only, anytime we restart Gerrit we do a full replication of all the repositories.

Looking at DependABot history (example, might be private: https://github.com/wikimedia/mediawiki-tools-cli/security/dependabot/59 ) shows the security is being opened and closed as the different replications kick off.

If so I guess Gerrit replication should be disabled!

It is an all or nothing. When we archive repository in Gerrit (via Projects-Cleanup) we delete the associated GitHub mirror and the replication does not attempt to create the repo (remote["github"].createMissingRepositories = false).

I guess we would need a different way of archiving repository, namely to the Gerrit project an access list which BLOCK Read for the mediawiki-replication group, which is the group the replication plugin check for triggering the replication.

Alternative is to have Gitlab replication to use it is own based after the GitLab repository name instead of the Gerrit one. Eg:

SourceGitHub
Gerrit mediawiki/tools/clihttps://github.com/wikimedia/mediawiki-tools-cli
Gitlab releng/clihttps://github.com/wikimedia/releng-cli

(given that if on GitHub one renames mediawiki-tools-cli to releng-cli, I am pretty sure GitHub will honor the redirect when Gerrit replicates so we are back to point 1). Then for the mirror I don't think we need to keep the Gerrit based name.

Alternative is to have Gitlab replication to use it is own based after the GitLab repository name instead of the Gerrit one. Eg:

SourceGitHub
Gerrit mediawiki/tools/clihttps://github.com/wikimedia/mediawiki-tools-cli
Gitlab releng/clihttps://github.com/wikimedia/releng-cli

(given that if on GitHub one renames mediawiki-tools-cli to releng-cli, I am pretty sure GitHub will honor the redirect when Gerrit replicates so we are back to point 1). Then for the mirror I don't think we need to keep the Gerrit based name.

I guess to set this up we would need to do a manual repo creation over the top of the old redirect?
I'd be up for giving that a go if others think its okay!

But I think another thing we should consider as we move forward through gerrit -> gitlab transitions is how to mke this better.
If mediawiki extensions migrate for example, i expect we won't leave a bunch of read only and also outdated mirrors on github, using a new name for repliation from gitlab?

brennen edited projects, added GitLab (Integrations); removed GitLab.
Addshore claimed this task.

https://github.com/wikimedia/mediawiki-tools-cli looks fixed now :)

Now idea how or since when!

hashar removed Addshore as the assignee of this task.

This is still an issue. With Gerrit and Gitlab both mirroring to the same GitHub project. As soon as Gerrit attempts to replicate the old mediawiki/tools/cli repo, the issue will surface again.

This comment was removed by Addshore.

@hashar right!
Is there some way to turn off the gerrit mirroring?

Is there some way to turn off the gerrit mirroring?

Here is the gerrit -> github replication config, it's in hieradata/common/profile/gerrit.yaml in the puppet repo.

profile::gerrit::replication:
    github:
        url: 'git@github.com:wikimedia/${name}'
        authGroup: 'mediawiki-replication'
        remoteNameStyle: 'dash'
        mirror: false
        push:
            - '+refs/heads/*:refs/heads/*'
            - '+refs/tags/*:refs/tags/*'
        createMissingRepositories: false
        threads: 2
        maxRetries: 50
        rescheduleDelay: 15
        replicatePermissions: false
        # Double escape backslashes:
        # once for regex, once for gitconfig formatting
        projects:
            - '^(?:(?!apps\\/ios\\/).)*$'
            - '^(?:(?!apps\\/android\\/).)*$'

Looks like the "projects:" at the bottom could be edited to exclude things.

Change #1029212 had a related patch set uploaded (by Addshore; author: Addshore):

[operations/puppet@production] Ignore mediawiki/tools/cli for gerrit replication

https://gerrit.wikimedia.org/r/1029212

Ideally I'd avoid having GitLab to replicate anything GitHub, I am not sure why that was setup and I don't think there is any need for such a replication. Then if it really has to be done, the replicated repo should probably respect the name it has on GitLab and by my comment on T333029#8726985 GitLab should replicate it to https://github.com/wikimedia/releng-cli which is less surprising and easier to follow.

There is a task to prevent repositories archived in Gerrit from being replicated to GitHub which is T351543. One way is to prevent the replication system from seeing the repo, an alternative is to rename repo under an archived namespace that would be added to that regex. That in turns needs a runbook to explain how to rename a repository.

So I'd prefer:

There is still room for namespace overlap between Gerrit and Gitlab mapped names though which would resurface that existing system.