Page MenuHomePhabricator

Github's wikimedia/ores not mirroring to Gerrit's scoring/ores/ores
Closed, ResolvedPublic

Description

Hi everybody,

the mirror between Github's wikimedia/ores not mirroring to Gerrit's scoring /ores/ores seems not working anymore: on the Gerrit side I don't see commits merged in the Github one.

https://github.com/wikimedia/ores/commits/master
https://gerrit.wikimedia.org/g/scoring/ores/ores/+/refs/heads/master

A similar thing happened in the past: T224996#7088785

I can see from phab1001's log the following:

[27-Jun-2022 09:05:25 UTC] [2022-06-27 09:05:25] PHLOG: 'Unexpected output while updating repository "ores": [2022-06-27 09:05:25] EXCEPTION: (PhutilProxyException) Error while pushing "rORES" repository to mirrors. {>} (PhutilAggregateEx
ception) Exceptions occurred while mirroring the "ores" repository.
    - CommandException: Command failed with error #128!
      COMMAND
      git push --verbose --mirror -- '********'
      
      STDOUT
      (empty)
      
      STDERR
      Pushing to ssh://********@gerrit.wikimedia.org:29418/scoring/ores/ores
      Warning: Permanently added '[gerrit.wikimedia.org]:29418,[2620:0:861:2:208:80:154:137]:29418' (RSA) to the list of known hosts.
      phabricator@gerrit.wikimedia.org: Permission denied (publickey).
      fatal: Could not read from remote repository.

Event Timeline

hashar subscribed.

In the Gerrit sshd logs (gerrit1001 /var/log/gerrit/sshd_log):

[2022-06-28T08:36:42.332Z] 3aca3ca9 [SSHD] phabricator - AUTH FAILURE FROM 2620:0:861:102:10:64:16:8 - - - user-not-found - - - -

Looking in logstash for user.name:phabricator AUTH FAILURE FROM 2620:0:861:102:10:64:16:8 over the last 3 months https://logstash.wikimedia.org/goto/acdb53a782bb64090d763796e10072d1 the issue has been going on for a while.

gerrit_phabricator_sshd_auth_failure.png (204×559 px, 11 KB)

rORES master branch is at 7a7e257dffa1 from Jun 16, 10:40 Merge pull request #361 from elukey/master.

I have checked the repository on Gerrit, the master branch of scoring/ores/ores is at 6ef6d22f8 from Jan 28th 2022 Merge pull request #355 from elukey/master.

On the Gerrit side the phabricator Gerrit account has the email mmodell+phab@wikimedia.org with account id 3594. The last change was to update the ssh key on May 14th 2021:

authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6lzdKlWejNe6laQSqdaBGV4gksF+5gfFFNW166g7ZKMhJa9URwsVtR6qKiVgxTKJ/L+FAMWkylDb/zUK2Lh9R7s/kptjJvVCniiNdK0U+iXqJIpDpWOwWdIyQVHc8av8wUaZuo3iwmUjnDgKbmomwNm0msjWSv6AVMv8gC9FiP0ejh1MP9DabyPnOxsznOPt3HWZlY7nY69aTGQTcy1nobfm/cIKiGb/neoh34a7s94iwwNFPHG99YsZq8At74uyadXdyGsi3jcLGO5pUZ1nyqmZ8exNGiDZ5OSbVa+S3g/iKDjoE7169mD3UYnPlVr3B5QflPyc6AflG1l18mwIL twentyafterfour@iridium
authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSsk4CRGePh6lLEZ6a0yLmzxA7DNN/ZMX9QtHTSGD4zbPbKRTA43ZBqA257cd3Rj6F2fc6jm8hbr7IVVxrYQV5G20pqspD8q782Zit0N5j2/RMz/pWjX2Ldyt99NVB6bOn7EBPrqrLnq3Zsnj/75MAGQBzh+xUZekVodSzcLBJtEvXfE682fPBL+SlGcLBg/Gx1U5BiD5upsNijIpICAABzx23KJRyXeOsG3l4JQsnV6b/prUYnRUPA94SUHnHM5euHf4N53OXcE/WSjdccowrfmZAmJxtc2+xTGnVU+LPUkRxCtCLceg+KSYwbrVuQUx9GM7CwkM/C6Yk8vDdpxNp twentyafterfour@phab1001

I have no idea how Phab holds the associated private key :-\

I am really tempted to take that opportunity to phase out the GithubPhabricatorGerrit replication which is only used for the ORES related repositories. Either migrate them from GitHub to Gerrit for alignment, or move to Gitlab (then iirc the repos use LFS to store large files). See also T213246

rORES replication config is at https://phabricator.wikimedia.org/source/ores/manage/uris/

rORES_URIs.png (209×927 px, 32 KB)

Phabricator mirrors to ssh://phab@gerrit.wikimedia.org:29418/scoring/ores/ores which has the username phab. It is defined in Gerrit with user id 7252 and phab+tcipriani@wikimedia.org. An account created in June 2019 and I don't think there is any phab@wikimedia.org email. There is no ssh key authorized for that account.

Gerrit sshd errors show Phabricator attempts to use the phabricator username which is Gerrit user id 3594.

It is a mystery :]

From the URI configuration history at https://phabricator.wikimedia.org/source/ores/uri/view/20533/

@MarcoAurelio changed this URI
from "https://gerrit.wikimedia.org/r/scoring/ores/ores"
to "ssh://Phabricator@gerrit.wikimedia.org:29418/scoring/ores/ores".Sep 11 2019, 23:19

So at some point it was syncing over https . The Phabricator account does have a ssh authorized key.

Then the ssh username got changed from Phabricator to phab but that account does not have a ssh key in Gerrit:

@MarcoAurelio changed this URI
from "ssh://Phabricator@gerrit.wikimedia.org:29418/scoring/ores/ores"
to "ssh://phab@gerrit.wikimedia.org:29418/scoring/ores/ores".
Sep 12 2019, 16:46

That has been done 3 years ago and I don't understand how the replication ever happened automatically.

The Phabricator replication from GitHub to Gerrit has been broken since September 2019 at least since the configured username phab does not have any ssh access. Maybe the URI should be changed to use Phabricator and if the associated Phabricator ssh key secret is the proper one, that will unbreak the replication. Then I don't know why the URI got changed by @MarcoAurelio

On the Gerrit server I have looked at the git reflog for the repo at /srv/gerrit/git/scoring/ores/ores.git:

commit 6ef6d22 (HEAD -> master)
Reflog: master@{0} (Accraze <accraze|account-7316@XXX>)
Reflog message: push
Merge: 0bd572d 99a29a6
Author:     A. Craze <acraze@wikimedia.org>
AuthorDate: Fri Jan 28 08:25:58 2022 -0800
Commit:     GitHub <noreply@github.com>
CommitDate: Fri Jan 28 08:25:58 2022 -0800

    Merge pull request #355 from elukey/master
    
    Add logging for feature extraction and scoring errors

commit 677f165
Reflog: master@{1} (Halfak <halfak|account-735@XXX>)
Reflog message: push
Merge: b2180fa f69989c
Author:     Andy Craze <acraze@wikimedia.org>
AuthorDate: Thu Feb 6 14:29:51 2020 -0800
Commit:     GitHub <noreply@github.com>
CommitDate: Thu Feb 6 14:29:51 2020 -0800

    Merge pull request #337 from wikimedia/memory_sensitive_models
    
    Implements ModelLoader for ScoringContext to control server/client memory

commit b338e7e (memory_sensitive_models)
Reflog: master@{2} (Halfak <halfak|account-735@XXX>)
Reflog message: push
Merge: 986d5fd f523510
Author:     Andy Craze <acraze@wikimedia.org>
AuthorDate: Tue Dec 17 13:58:28 2019 -0800
Commit:     GitHub <noreply@github.com>
CommitDate: Tue Dec 17 13:58:28 2019 -0800

    Merge pull request #334 from wikimedia/main_edit_event
    
    Adds main_edit and main_creation events for precache

commit 986d5fd
Reflog: master@{3} (Phabricator <phab|account-3594@XXX>)
Reflog message: push
Merge: 338aaf1 7101df6
Author:     Aaron Halfaker <aaron.halfaker@gmail.com>
AuthorDate: Thu Oct 24 10:54:24 2019 -0500
Commit:     GitHub <noreply@github.com>
CommitDate: Thu Oct 24 10:54:24 2019 -0500

    Merge pull request #333 from kevinbazira/50-rev_ids-limit-on-ORES-request
    
    Limit rev_ids in an ORES request to 50

Or looking solely at the Reflog: and Reflog message: and CommitDate: headers:

Reflog: master@{0} (Accraze <accraze|account-7316@XXX>)  # Manual push
Reflog message: push
CommitDate: Fri Jan 28 08:25:58 2022 -0800

Reflog: master@{1} (Halfak <halfak|account-735@XXX>)  # Manual push
Reflog message: push
CommitDate: Thu Feb 6 14:29:51 2020 -0800

Reflog: master@{2} (Halfak <halfak|account-735@XXX>)  # Manual push
Reflog message: push
CommitDate: Tue Dec 17 13:58:28 2019 -0800

Reflog: master@{3} (Phabricator <phab|account-3594@XXX>) # Automated push
Reflog message: push
CommitDate: Thu Oct 24 10:54:24 2019 -0500

The last working replication used the phab user somehow but the next three updates were manual pushes by @Halfak and @ACraze.

Mentioned in SAL (#wikimedia-releng) [2022-06-28T09:37:50Z] <hashar> phabricator: changed username of rORES Phab>Gerrit replication from phab to phabricator # T311390

I am really tempted to take that opportunity to phase out the GithubPhabricatorGerrit replication

+1

I have triggered a mirror push on phab1001:

/srv/phab/phabricator/bin/repository mirror --verbose ORES
Pushing "ores" to mirrors...
Pushing to remote "ssh://phabricator@gerrit.wikimedia.org:29418/scoring/ores/ores"...
[2022-06-28 09:44:05] EXCEPTION: (PhutilAggregateException) Exceptions occurred while mirroring the "ores" repository.
    - CommandException: Command failed with error #128!
      COMMAND
      /usr/bin/sudo -E -n -u phd -- git push --verbose --mirror -- '********'
      
      STDOUT
      (empty)
      
      STDERR
      Pushing to ssh://********@gerrit.wikimedia.org:29418/scoring/ores/ores
      Warning: Permanently added '[gerrit.wikimedia.org]:29418,[2620:0:861:2:208:80:154:137]:29418' (RSA) to the list of known hosts.
      phabricator@gerrit.wikimedia.org: Permission denied (publickey).
      fatal: Could not read from remote repository.
      
      Please make sure you have the correct access rights
      and the repository exists.

The remote is configured to use the ssh key {K19} and its public key is indeed in the list of authorized_keys for Gerrit user phabricator id 3594. Still does not work for some reason though :-\

Reedy renamed this task from Github's wikimedia/ores not mirroring to Gerrit's scoring /ores /ores to Github's wikimedia/ores not mirroring to Gerrit's scoring/ores/ores.Jun 28 2022, 2:17 PM
Reedy removed a subscriber: ACraze.

The Phabricator replication from GitHub to Gerrit has been broken since September 2019 at least since the configured username phab does not have any ssh access. Maybe the URI should be changed to use Phabricator and if the associated Phabricator ssh key secret is the proper one, that will unbreak the replication. Then I don't know why the URI got changed by @MarcoAurelio

I think I remember we had a similar issue in the past (that is, the GitHub-Phab-Gerrit replication not working) and were trying to debug with @Paladox and @20after4 and that's probably why we changed the config.

phab seems the correct user name per https://ldap.toolforge.org/user/phab which is the only one in the Phab-To-Gerrit group at https://gerrit.wikimedia.org/r/admin/groups/99a676b7854993180fcb7b4f2771b19c3203870e,members

GitHub and Phab repos are in sync. I'll try to figure out what happens with the Phab-To-Gerrit thing.

The phab user in Gerrit (id 7252) does not have any ssh key.

We have two credentials in Phabricator

  • {K18} which is a user / password for phab
  • {K19} which is a ssh private key

The later has a public key which is authorized for the Phabricator user in Gerrit (id 3594). I guess we can try retrieving it and verifying manually whether it works.

But most probably, I will end up disabling that replication entirely in favor of users doing manual pushes until they move to Gerrit or Gitlab.

Just so we don't get lost:

After @MarcoAurelio change to use https and the phab user, I have triggered the replication:

/srv/phab/phabricator/bin/repository mirror --verbose ORES
Pushing "ores" to mirrors...
Pushing to remote "https://phab@gerrit.wikimedia.org/r/a/scoring/ores/ores"...
[2022-06-29 11:24:31] EXCEPTION: (PhutilAggregateException) Exceptions occurred while mirroring the "ores" repository.
    - CommandException: Command failed with error #1!
      COMMAND
      /usr/bin/sudo -E -n -u phd -- git push --verbose --mirror -- '********'
      
      STDOUT
      (empty)
      
      STDERR
      Pushing to https://********@gerrit.wikimedia.org/r/a/scoring/ores/ores
      POST git-receive-pack (53508 bytes)
      remote: error: branch refs/meta/config:        
      remote: Cannot delete project configuration from 'refs/meta/config'        
      remote: error: branch refs/meta/config:        
      remote: You need 'Delete Reference' rights or 'Push' rights with the         
      remote: 'Force Push' flag set to delete references.        
      remote: User: phab        
      remote: Contact an administrator to fix the permissions        
remote: Processing changes: refs: 33, done            
      To https://gerrit.wikimedia.org/r/a/scoring/ores/ores
       = [up to date]      refs/pull/1/head -> refs/pull/1/head
       = [up to date]      refs/pull/102/head -> refs/pull/102/head
       = [up to date]      refs/pull/103/head -> refs/pull/103/head
       = [up to date]      refs/pull/104/head -> refs/pull/104/head
       = [up to date]      refs/pull/105/head -> refs/... (19,823 more bytes) ... at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryMirrorEngine.php:49]
arcanist(), ava(), phabricator(), translations(), wmf-ext-misc()
  #0 <#2> ExecFuture::raiseResultError(array) called at [<arcanist>/src/future/exec/ExecFuture.php:340]
  #1 <#2> ExecFuture::resolvex() called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryMirrorEngine.php:104]
  #2 <#2> PhabricatorRepositoryMirrorEngine::pushToGitRepository(PhabricatorRepository, PhabricatorRepositoryURI) called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryMirrorEngine.php:67]
  #3 <#2> PhabricatorRepositoryMirrorEngine::pushRepositoryToMirror(PhabricatorRepository, PhabricatorRepositoryURI) called at [<phabricator>/src/applications/repository/engine/PhabricatorRepositoryMirrorEngine.php:42]
  #4 PhabricatorRepositoryMirrorEngine::pushToMirrors() called at [<phabricator>/src/applications/repository/management/PhabricatorRepositoryManagementMirrorWorkflow.php:44]
  #5 PhabricatorRepositoryManagementMirrorWorkflow::execute(PhutilArgumentParser) called at [<arcanist>/src/parser/argument/PhutilArgumentParser.php:492]
  #6 PhutilArgumentParser::parseWorkflowsFull(array) called at [<arcanist>/src/parser/argument/PhutilArgumentParser.php:377]
  #7 PhutilArgumentParser::parseWorkflows(array) called at [<phabricator>/scripts/repository/manage_repositories.php:22]
hashar@phab1001:~$

It fails cause it tries to delete refs/meta/config which is Gerrit specific and well can't be deleted ;)

Fixed with the help of @hashar by doing the following:

Thank you @MarcoAurelio !

@elukey there are surely other repositories having a Phabricator replication from GitHub to Gerrit which is T213246 . I guess it should be revisited :)

I've filed a private Task at T311624 with some follow-ups regarding this Phab-To-Gerrit push service.

Mentioned in SAL (#wikimedia-releng) [2022-07-05T08:49:31Z] <hauskatze> Diffusion rORES repository. Changed URI settings: enabled SSH push for mirroring; disabled HTTP | T311390

@elukey With the Phab SSH replication service fixed in T311624 I went ahead and restored SSH mirroring to rORES using https://phabricator.wikimedia.org/source/ores/uri/view/27732/ - Please let me or @thcipriani if it gets stuck again or there are other issues. In principle, this should be working fine again.

@MarcoAurelio Thanks a lot for the help! I hope to be able to deprecate all these repos soon-ish :)