Page MenuHomePhabricator

Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003)
Closed, ResolvedPublic

Description

Current Status:

This bug is fixed in openssh upstream and in Debian 9.10.

References:

  1. Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=905226
  2. Phabricator bug report: https://discourse.phabricator-community.org/t/timeout-with-phabricator-ssh-hook-in-cluster/1205/2
  3. Patch against the debian openssh-server package: 905226.patch

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 30 2019, 4:21 PM

I see this in the logs:

User vcs not allowed because account is locked

So something is wrong with the vcs user on phab1003.

Mentioned in SAL (#wikimedia-operations) [2019-05-30T18:26:38Z] <mutante> phab1003 - switch 'vcs' user to 'NP' to match phab1001 setup and then /srv/phab/phabricator# ./bin/config set diffusion.ssh-user vcs (T224677)

Aklapper renamed this task from Cannot connect to vcs@git-ssh.wikimedia.org to Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003).May 30 2019, 7:11 PM

Ok at least the ssh layer seems to be working now:

debug1: Offering RSA public key: ***
debug1: Authentication succeeded (publickey).
Authenticated to git-ssh.wikimedia.org ([208.80.154.250]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: network
debug1: channel 0: free: client-session, nchannels 1

Change 513379 had a related patch set uploaded (by 20after4; owner: 20after4):
[operations/puppet@production] Phabricator: write an ssh.log for phabricator's sshd

https://gerrit.wikimedia.org/r/513379

Change 513379 merged by Dzahn:
[operations/puppet@production] Phabricator: write an ssh.log for phabricator's sshd

https://gerrit.wikimedia.org/r/513379

Change 513407 had a related patch set uploaded (by 20after4; owner: 20after4):
[operations/puppet@production] Phabricator: write an ssh.log for phabricator's sshd

https://gerrit.wikimedia.org/r/513407

Change 513407 merged by Dzahn:
[operations/puppet@production] Phabricator: write an ssh.log for phabricator's sshd

https://gerrit.wikimedia.org/r/513407

Mentioned in SAL (#wikimedia-cloud) [2019-05-30T22:12:23Z] <wm-bot> <lucaswerkmeister> git remote add github https://github.com/lucaswerkmeister/tool-quickcategories.git # work around T224677

sudo -u vcs /srv/phab/phabricator/bin/ssh-auth works as expected on phab1003, however, from my machine, ssh -T vcs@git-ssh.wikimedia.org results in a long pause followed by a disconnect.

The long pause seems like it should be a revealing clue but I'm not quite sure of what.

mmodell updated the task description. (Show Details)Jun 4 2019, 3:43 PM

ssh -tv vcs@git-ssh.wikimedia.org

--- skipped verbose output from ssh session setup. Interesting part below:  ---
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,keyboard-interactive
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/monkey/.ssh/id_rsa
--- long pause here ---
debug1: Authentication succeeded (publickey).
Authenticated to git-ssh.wikimedia.org ([208.80.154.250]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: network
debug1: channel 0: free: client-session, nchannels 1
Connection to git-ssh.wikimedia.org closed by remote host.
Connection to git-ssh.wikimedia.org closed.
Transferred: sent 1944, received 1564 bytes, in 0.0 seconds
Bytes per second: sent 11765839.8, received 9465932.8
debug1: Exit status -1
mmodell added a subscriber: Dzahn.Jun 4 2019, 3:59 PM
zeljkofilipin moved this task from Backlog 🔙 to Watching 👀 on the User-zeljkofilipin board.
zeljkofilipin awarded a token.
zeljkofilipin added a subscriber: zeljkofilipin.
Paladox triaged this task as Unbreak Now! priority.Jun 8 2019, 12:22 AM

Cloning over https is also broken

git clone https://phabricator.wikimedia.org/diffusion/EBCR/extension-breadcrumbs.git
Cloning into 'extension-breadcrumbs'...
warning: templates not found in /Users/xxx/.git-templates
Connection closed by 2620:0:861:ed1a::3:16 port 22
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Also happened to wikibase-serialization which is a submodule to Wikibase making this a UBN task now.

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJun 8 2019, 12:22 AM
Paladox lowered the priority of this task from Unbreak Now! to High.Jun 8 2019, 12:56 AM

Turns out it was an issue on my side. I had

[url "ssh://vcs@git-ssh.wikimedia.org"]
    insteadOf = https://phabricator.wikimedia.org

set.

Mentioned in SAL (#wikimedia-cloud) [2019-06-08T14:00:12Z] <wm-bot> <lucaswerkmeister> git remote add github https://github.com/lucaswerkmeister/tool-lexeme-forms.git # work around T224677

Mentioned in SAL (#wikimedia-cloud) [2019-06-08T14:01:26Z] <wm-bot> <lucaswerkmeister> git remote add github https://github.com/lucaswerkmeister/tool-lexeme-forms.git # work around T224677

@mmodell you get farther than I do. I've checked the db and see the right key in there for me, but my client makes the offer and never hears back again. This is with options -vvv -t vcs@git-ssh.wikimedia.org. Excerpt:

...
debug1: Authenticating to git-ssh.wikimedia.org:22 as 'vcs'
...
debug2: pubkey_prepare: done
debug3: send packet: type 5
debug3: receive packet: type 7
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug3: receive packet: type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug3: send packet: type 50
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey,keyboard-interactive
debug3: start over, passed a different list publickey,keyboard-interactive
debug3: preferred gssapi-keyex,gssapi-with-mic,publickey,keyboard-interactive,password
debug3: authmethod_lookup publickey
debug3: remaining preferred: keyboard-interactive,password
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering public key: /home/ariel/.ssh/identities/gerrit/id_rsa_gerrit RSA SHA256:W16D1HTJSVAZBhd9gRcQAdcc58hy9udUn/DHkoBvl+o agent
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
...
(very long wait)
...
Connection closed by 208.80.154.250 port 22

And that's it. I did once catch the process on phab1003 talking to dbproxy1008 (presumably to get that list of keys), but there's no logs worth anything anywhere to try to debug this. Note that if I provide the wrong key I get rejected right away, with the standard 'Permission denied (publickey,keyboard-interactive'.

Any thoughts?

@ArielGlenn: The only thing left to do that I can think of is to run the git sshd in debug mode (and maybe with strace) to see what's happening, since there are no useful logs to be found.

Note that if I provide the wrong key I get rejected right away, with the standard 'Permission denied (publickey,keyboard-interactive'.

That is interesting. I'm still kind of stumped but I'm going to try debugging sshd today after the releng team meeting.

Awesome, I'll be around if it's not ridiculous o'clock for me. There's a presentation in about 90 minutes I want to be at, so if after that our stars align, ping me on irc.

mmodell added a comment.EditedJun 10 2019, 5:25 PM

openssh-server: SSH AuthorizedKeysCommand hangs when output is too large

Ah ha! That's probably it! The authorized keys command outputs a very large list of keys.

So it appears that there is a fix upstream in sshd but it hasn't made it's way into a stable debian package. What's the next step here? Move phabricator back to an older debian version? Build a custom sshd package with the fix applied? Something else? What do you thank @ArielGlenn?

I'd like to see us test with a locally patched sshd and see if that's indeed the problem, as a first step.

Perhaps you could also switch the sshd config to AuthorizedKeysFile and rebuild that file from the AuthorizedKeysCommand via a cronjob or similar? That would also let you test if it’s really the AuthorizedKeysCommand problem, without requiring a package rebuild. (Of course, it’s not a good permanent solution, since key changes wouldn’t become effective until the next file rebuild.)

@LucasWerkmeister that could work, though if the fix is as simple as it appears to be then I'd like to just patch sshd. @ArielGlenn what do you think? If I attempt to build sshd with a patch, where should I build it and how should I deploy the patched binary?

If we build it for reals, I'd ask @MoritzMuehlenhoff about all that. If we're just doing it for testing, shove it in some directory you have access to and tell me, and we can coordinate swapping it in for a few minutes for a test.

@ArielGlenn I built the whole package successfully and uploaded the sshd binary to my home directory on phab1003. I think I can test it without swapping out the normal sshd binary by simply stopping the git-sshd service and then running the patched binary in debug mode with sudo ./sshd -d

mmodell claimed this task.Jun 24 2019, 2:31 AM

@ArielGlenn I built the whole package successfully and uploaded the sshd binary to my home directory on phab1003. I think I can test it without swapping out the normal sshd binary by simply stopping the git-sshd service and then running the patched binary in debug mode with sudo ./sshd -d

Sounds fine to me but make the switch fast so we don't have anyone unable to connect (and icigina stays happy).

So I finally got a chance to test this, I can confirm that my patched sshd binary fixes the issue.

All I did was to apply this patch[1] and then rebuilt the .deb using debuild.

So now that it's tested and does indeed fix the issue:

If we build it for reals, I'd ask @MoritzMuehlenhoff about all that.

@MoritzMuehlenhoff: So, what's the next step for getting this patch applied and a new sshd package uploaded somewhere so that it can be installed on phabricator hosts?


  1. The upstream patch: 905226.patch

mmodell removed mmodell as the assignee of this task.Jun 25 2019, 2:13 AM
mmodell moved this task from Backlog to Patch merged upstream on the Upstream board.

I'll file a bug against the Debian OpenSSH package, this seems like a suitable candidate to apply in a point release as the patch is small enough and it fixes a genuine bug.

This is causing significant inconvenience as we have some repositories which are hosted on phabricator and cannot be pushed over ssh. I'm kind of grasping at straws here but I'll ask anyway:
Is there any temporary solution that we could use to get the self-built binary on phab1003 (and then configure sshd-phabricator service to use it)? I'm not even sure what sort of solution would be acceptable to SRE and I'm not very keen on it given the ad-hoc nature of installing binaries outside of the dpkg system. I only ask because debian isn't exactly known for moving quickly and the upstream bug has been open for quite a while without any movement since August of last year.

@mmodell Ideally we fix this in Debian so that others can also benefit from our fix, but failing that we'll simply deploy a WMF-specific fix. I deploy an interim fix for phab1003 on Thu or Fri, busy today.

@mmodell I've built an interim package on our package builder and copied it to phab1003:/home/jmm/openssh. Feel free to install it using "dpkg -i" whenever the time is right (or I can do it, but wanted to sync up with you first)

This isn't uploaded yet to apt.wikimedia.org given that this is primarily of importance for Phabricator, I'll wait a little further to see what happens wrt a Stretch update in Debian.

Thanks @MoritzMuehlenhoff ! I really appreciate it! I'll install that now.

Change 519532 had a related patch set uploaded (by 20after4; owner: 20after4):
[operations/puppet@production] phabricator: Manage ownership of /var/log/phd/ssh.log

https://gerrit.wikimedia.org/r/519532

jbond added a subscriber: jbond.Jul 1 2019, 2:59 PM

@MoritzMuehlenhoff The updated package has broken some dependencies which is causing an error on phab1003

sudo apt-get install php7.2-mysql     
Reading package lists... Done
Building dependency tree       
Reading state information... Done
php7.2-mysql is already the newest version (7.2.16-1+0~20190307202415.17+stretch~1.gbpa7be82+wmf1).
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 openssh-sftp-server : Depends: openssh-client (= 1:7.4p1-10+deb9u6) but 1:7.4p1-10+deb9u6+wmf1 is to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

Im not sure if there is an easy way to fix this without rebuilding openssh-sftp-server and openssh-server (plus anything else which may need building)

openssh-sftp-server also built from src:openssh, I've also installed the SFTP package and that fixed it.

@MoritzMuehlenhoff: cool! FWIW the new package seems to have fixed the problem and I haven't noticed any other bugs with it.

Dzahn awarded a token.Jul 1 2019, 3:17 PM

Change 519532 merged by Dzahn:
[operations/puppet@production] phabricator: Manage ownership of /var/log/phd/ssh.log

https://gerrit.wikimedia.org/r/519532

Dzahn added a comment.EditedJul 3 2019, 3:20 AM

/var/log/phd has been created and in there the ssh.log is now owned by the $vcsuser:root and we did chmod 640, via puppet of course.

mmodell closed this task as Resolved.Jul 8 2019, 4:33 PM

I've submitted a proposed update to fix the underlying OpenSSH bug in Debian Stretch: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932175

The update has been accepted by the Debian stable release managers and was uploded: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=932175#24, so the 9.10 point release for Stretch will contain the updated package.

The Debian Stretch point release with the OpenSSH backport was released last weekend: https://lists.debian.org/debian-announce/2019/msg00006.html

Thank you @MoritzMuehlenhoff for your efforts getting the backport submitted and accepted upstream. It's much appreciated by me and probably some other users of Phabricator on debian as well.