Page MenuHomePhabricator

Can't `git pull` mediawiki/core from Gerrit: "fatal: the remote end hung up unexpectedly" SSH_MSG_CHANNEL_WINDOW_ADJUST
Closed, ResolvedPublic

Description

Logstash view (private) for SSH_MSG_CHANNEL_WINDOW_ADJUST errors: https://logstash.wikimedia.org/goto/fa9a106798206f16ca48ccb3ca47f34b


If you encounter this issue, please try again with more verbosity:

GIT_SSH_COMMAND='ssh -vv' git fetch

That would give more information about what is happening with ssh on the client side.


Upstream informations

From a thread in April 2020 at https://groups.google.com/g/repo-discuss/c/UUvDYv2qWqY

It's this issue: SSHD-942, that was fixed in: https://gerrit-review.googlesource.com/c/gerrit/+/238384 . The fix is included since gerrit 3.1.0, so that I actually surprised you still see it on gerrit 3.3.0.


I receive this error message every time when trying to pull the mediawiki/core repo. Other repos seem fine.

I've seen the issue yesterday for the first time, but it worked after retrying. Today it no longer works.

C:\mediawiki\core>git pull
remote: Counting objects: 23789, done
remote: Finding sources: 100% (976/976)
remote: Getting sizes: 100% (282/282)
remote: Compressing objects: 100% (4740744/4740744)
client_loop: send disconnect: Connection reset by peer KiB/s
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
warning: unable to unlink '(NULL)': Invalid argument

My Git version is:

C:\mediawiki\core>git --version
git version 2.28.0.windows.1

(and yes, I'm on Windows)

Event Timeline

I've worked around the problem by adding the GitHub mirror as a remote and pulling from there. This works fine.

git remote add github git@github.com:wikimedia/mediawiki.git
git pull github master

Can you post the list of remotes please (ie. git remote -v)?

@matmarex, I don't think it's about Windows :). I've faced this problem multiple times and I'm using macOS. But sometimes, when my internet connection is faster, it works. But I think since it could be internet speed centric, one would not always rely on waiting for a faster internet. Let me try your work around and see if it works now! Thanks!

>  git --version
git version 2.28.0

Can you post the list of remotes please (ie. git remote -v)?

For me, this is what I have:

gerrit   ssh://d3r1ck01@gerrit.wikimedia.org:29418/mediawiki/core.git (fetch)
gerrit  ssh://d3r1ck01@gerrit.wikimedia.org:29418/mediawiki/core.git (push)
origin  ssh://d3r1ck01@gerrit.wikimedia.org:29418/mediawiki/core.git (fetch)
origin  ssh://d3r1ck01@gerrit.wikimedia.org:29418/mediawiki/core.git (push)

Are you cloning over ssh? I am pretty sure it is a timeout on the Gerrit side.

Yes, I'm also cloning/pulling over SSH.

C:\mediawiki\core>git remote -v
gerrit  ssh://matmarex@gerrit.wikimedia.org:29418/mediawiki/core.git (fetch)
gerrit  ssh://matmarex@gerrit.wikimedia.org:29418/mediawiki/core.git (push)
...

(I have some other remotes, which I cut from the snippet above, hopefully they don't matter; I don't have an origin though, just gerrit.)

In the Gerrit log:

[2020-09-18 21:12:36,307] [sshd-SshDaemon[714d7c7b](port=22)-nio2-thread-16] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[matmarex@/X.X.X.X:yyyy])[state=Opened] SshChannelNotFoundException: Received SSH_MSG_CHANNEL_WINDOW_ADJUST on unassigned channel 0 (last assigned=null)

In Gerrit sshd log:

[2020-09-18T21:11:22.356+0000] df95ee62 [sshd-SshDaemon[714d7c7b](port=22)-nio2-thread-30] matmarex a/yyy LOGIN FROM X.X.X.X
[2020-09-18T21:12:29.888+0000] df95ee62 [SSH git-upload-pack /mediawiki/core.git (matmarex)] matmarex a/yyy git-upload-pack./mediawiki/core.git 3ms 67133ms 13528ms 39ms 4ms 175ms 251ms 147ms 616ms 190 631 976 1778334 0 git/2.28.0.windows.1
[2020-09-18T21:12:36.308+0000] df95ee62 [sshd-SshDaemon[714d7c7b](port=22)-nio2-thread-16] matmarex a/yyy LOGOUT

Not really sure what is going to be honest. You can try fetching with https instead by reconfiguring the remote with:

git remote set-url origin https://gerrit.wikimedia.org/r/mediawiki/core.git
git remote set-url --push origin ssh://gerrit.wikimedia.org:29418/mediawiki/core.git

For further debug, one can ask git to run ssh in verbose mode using:

GIT_SSH_COMMAND='ssh -v' git fetch

Add more v for more verbosity. That might give a hint.

some finds by searching for the "Received SSH_MSG_CHANNEL_WINDOW_ADJUST on unassigned channel 0 ".

They are both gerrit and specifically MINA SSHD related.

https://issues.apache.org/jira/browse/SSHD-942
https://github.com/CandyShop/gerrit/issues/234

The "SSH session crashes on high latency network" from the first link is why @Paladox was asking if you are on a slow network, i suppose.

For the record, I tried again today and it worked fine… My connection should be pretty good, but maybe it was actually some temporary latency problem.

Thanks for the update. High latency would match other similar bug reports. Is it ok if we close it and reopen if it happens again?

Fine by me, but it looks like @xSavitar was having the same problem?

Fine by me, but it looks like @xSavitar was having the same problem?

Yes @matmarex. But sometimes it works too. Especially when my connection is relatively faster. During the day, this happens but early mornings or late nights, it doesn't.

Looking at @xSavitar recent accesses:

sshd_log.2020-09-15.gz:[2020-09-15T22:01:13.725+0000] 68480692 [SshCommandDestroy-0] d3r1ck01 a/2842 git-upload-pack./mediawiki/core.git 4ms 698936ms killed

Killed after ~ 12 minutes. And the related stacktrace:

[2020-09-15 22:01:13,726] [SSH git-upload-pack /mediawiki/core.git (d3r1ck01)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user d3r1ck01 account 2842) during git-upload-pack '/mediawiki/core.git'
org.apache.sshd.common.channel.WindowClosedException: Already closed: Window[server/remote](ChannelSession[id=0, recipient=0]-ServerSessionImpl[d3r1ck01@/X.X.X.X:yyyyy])
        at org.apache.sshd.common.channel.Window.waitForCondition(Window.java:302)
        at org.apache.sshd.common.channel.Window.waitForSpace(Window.java:252)
        at org.apache.sshd.common.channel.ChannelOutputStream.flush(ChannelOutputStream.java:196)
        at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:132)
        at org.eclipse.jgit.transport.UploadPack$ResponseBufferedOutputStream.write(UploadPack.java:2371)
        at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:141)
        at org.eclipse.jgit.transport.SideBandOutputStream.write(SideBandOutputStream.java:120)
        at org.eclipse.jgit.internal.storage.pack.PackOutputStream.write(PackOutputStream.java:99)
        at org.eclipse.jgit.internal.storage.file.ByteArrayWindow.write(ByteArrayWindow.java:58)
        at org.eclipse.jgit.internal.storage.file.PackFile.copyAsIs2(PackFile.java:547)
        at org.eclipse.jgit.internal.storage.file.PackFile.copyAsIs(PackFile.java:399)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.copyObjectAsIs(WindowCursor.java:188)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1703)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1680)
        at org.eclipse.jgit.internal.storage.pack.PackOutputStream.writeObject(PackOutputStream.java:138)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.writeObjects(WindowCursor.java:196)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1668)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjects(PackWriter.java:1655)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.writePack(PackWriter.java:1214)
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:2312)
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:2143)
        at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:1077)
        at org.eclipse.jgit.transport.UploadPack.uploadWithExceptionPropagation(UploadPack.java:834)
        at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:748)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:101)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:107)
        at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:32)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:72)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:469)
        at com.google.gerrit.server.logging.LoggingContextAwareRunnable.run(LoggingContextAwareRunnable.java:110)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:610)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        Suppressed: org.apache.sshd.common.channel.exception.SshChannelClosedException: write(ChannelOutputStream[ChannelSession[id=0, recipient=0]-ServerSessionImpl[d3r1ck01@/X.X.X.X:yyyyy]] SSH_MSG_CHANNEL_DATA) len=26 - channel already closed
                at org.apache.sshd.common.channel.ChannelOutputStream.write(ChannelOutputStream.java:114)
                at org.eclipse.jgit.transport.UploadPack$ResponseBufferedOutputStream.write(UploadPack.java:2371)
                at org.eclipse.jgit.transport.SideBandOutputStream.writeBuffer(SideBandOutputStream.java:141)
                at org.eclipse.jgit.transport.SideBandOutputStream.flushBuffer(SideBandOutputStream.java:94)
                at org.eclipse.jgit.transport.SideBandOutputStream.flush(SideBandOutputStream.java:100)
                at org.eclipse.jgit.transport.UploadPack$SideBandErrorWriter.writeError(UploadPack.java:2404)
                at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:766)
                ... 14 more

I thought it might be the same as T249410, but that one had a meaningful error message:

java.net.SocketTimeoutException: waitForCondition(Window[server/remote](ChannelSession[id=0, recipient=0]-ServerSessionImpl[alqaholic007@/x.x.x.x:yyyy])) timeout exceeded: 30000

I got this issue again today trying to pull operations/mediawiki-config. It failed twice, then worked on the third try.

C:\mediawiki\operations\mediawiki-config>git pull
remote: Counting objects: 4104, done
remote: Finding sources: 100% (132/132)
remote: Getting sizes: 100% (69/69)
remote: Compressing objects: 100% (9216694/9216694)
client_loop: send disconnect: Connection reset by peer KiB/s
fatal: the remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed
warning: unable to unlink '(NULL)': Invalid argument

Happened again today, trying to pull operations/mediawiki-config. Worked on the fourth try.

1C:\mediawiki\operations\mediawiki-config>git pull
2remote: Counting objects: 522, done
3remote: Finding sources: 100% (339/339)
4remote: Getting sizes: 100% (95/95)
5remote: Compressing objects: 100% (4037898/4037898)
6client_loop: send disconnect: Connection reset by peerKiB/s
7fatal: the remote end hung up unexpectedly
8fatal: early EOF
9fatal: index-pack failed
10warning: unable to unlink '(NULL)': Invalid argument
11
12C:\mediawiki\operations\mediawiki-config>git pull
13remote: Counting objects: 522, done
14remote: Finding sources: 100% (339/339)
15remote: Getting sizes: 100% (95/95)
16remote: Compressing objects: 100% (4037898/4037898)
17client_loop: send disconnect: Connection reset by peerKiB/s
18fatal: the remote end hung up unexpectedly
19fatal: early EOF
20fatal: index-pack failed
21warning: unable to unlink '(NULL)': Invalid argument
22
23C:\mediawiki\operations\mediawiki-config>git pull
24remote: Counting objects: 522, done
25remote: Finding sources: 100% (339/339)
26remote: Getting sizes: 100% (95/95)
27remote: Compressing objects: 100% (4037898/4037898)
28fatal: the remote end hung up unexpectedlyKiB | 101.00 KiB/s
29fatal: early EOF
30Connection to gerrit.wikimedia.org closed by remote host.
31fatal: index-pack failed
32warning: unable to unlink '(NULL)': Invalid argument

I found the three errors in Gerrit and they are the same SshChannelNotFoundException: Received SSH_MSG_CHANNEL_WINDOW_ADJUST on unassigned channel 0 (last assigned=null):

[2021-02-11T23:05:48.339+0000] [sshd-SshDaemon[1771d6a5](port=22)-nio2-thread-26] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[matmarex@/x.y.z.v:54834])[state=Opened] SshChannelNotFoundException: Received SSH_MSG_CHANNEL_WINDOW_ADJUST on unassigned channel 0 (last assigned=null)
[2021-02-11T23:07:16.411+0000] [sshd-SshDaemon[1771d6a5](port=22)-nio2-thread-13] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[matmarex@/x.y.z.v:64893])[state=Opened] SshChannelNotFoundException: Received SSH_MSG_CHANNEL_WINDOW_ADJUST on unassigned channel 0 (last assigned=null)
[2021-02-11T23:08:24.219+0000] [sshd-SshDaemon[1771d6a5](port=22)-nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[matmarex@/x.y.z.v:64895])[state=Opened] SshChannelNotFoundException: Received SSH_MSG_CHANNEL_WINDOW_ADJUST on unassigned channel 0 (last assigned=null)

A bit more details from sshd_log for the first one:

[2021-02-11T23:04:53.389+0000] 5f66a205 [sshd-SshDaemon[1771d6a5](port=22)-nio2-thread-13] matmarex a/417 LOGIN FROM x.y.z.v
[2021-02-11T23:05:41.110+0000] 5f66a205 [SSH git-upload-pack /operations/mediawiki-config.git (matmarex)] matmarex a/417 git-upload-pack./operations/mediawiki-config.git 5ms 47043ms 8653ms 3ms 2ms 18ms 95ms 41ms 159ms 91 198 339 1683681 0 git/2.28.0.windows.1
[2021-02-11T23:05:48.340+0000] 5f66a205 [sshd-SshDaemon[1771d6a5](port=22)-nio2-thread-26] matmarex a/417 LOGOUT

The git-upload-pack has completed successfully in 47 seconds. I dont get why it takes 7 seconds to get the LOGOUT though, should be a few hundred of miliseconds.

The git-upload-pack metrics are described at https://gerrit.wikimedia.org/r/Documentation/logs.html#_sshd_log

Waiting5ms
Command execution time47043ms
Time negotiating8653ms
Time searching for reuse3ms
Time searching for sizes2ms
Time counting18ms
Time compressing95 ms
Time writing41ms
Total time in UploadPack159ms
Bitmap Index Misses91
Total deltas198
Total Objects339
Total Bytes1683681 (~ 1.6MB)
status0

From the bug pasted previously:

If you are using ssh ControlMaster maybe disable it?
You will get a faster object negotiation by enabling git protocol version 2 Blog Post: Faster source code fetches thanks to git protocol version 2.

From T276500 GIT_TRACE=1 doesn't give much more details. We need the ssh client side verbose mode: GIT_SSH_COMMAND='ssh -v' git clone ssh://.....

Not much we can do ourselves but that has been discussed this week on the Gerrit mailing list ( https://groups.google.com/g/repo-discuss/c/UUvDYv2qWqY ).

The upstream report is https://bugs.chromium.org/p/gerrit/issues/detail?id=11491 and the upstream upstream one is https://issues.apache.org/jira/browse/SSHD-942 . I have updated the task description with those links.

There is an undocumented experimental feature flag (on by default) which got introduced in an attempt to solve the issue ( https://bugs.chromium.org/p/gerrit/issues/detail?id=11572 ). Might be worth trying to turn it off with:

gerrit.config
[sshd]
    enableChannelIdTracking = false
hashar renamed this task from Can't `git pull` mediawiki/core from Gerrit: "fatal: the remote end hung up unexpectedly" to Can't `git pull` mediawiki/core from Gerrit: "fatal: the remote end hung up unexpectedly" SSH_MSG_CHANNEL_WINDOW_ADJUST.Apr 9 2021, 9:26 AM
hashar triaged this task as Medium priority.

Change 755968 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] gerrit: set sshd.enableChannelIdTracking=false

https://gerrit.wikimedia.org/r/755968

Lets give sshd.enableChannelIdTracking=false a try

Change 755968 merged by Dzahn:

[operations/puppet@production] gerrit: set sshd.enableChannelIdTracking=false

https://gerrit.wikimedia.org/r/755968

I do not see any occurence of SSH_MSG_CHANNEL_WINDOW_ADJUST or unassigned in the Gerrit error_log files. So I guess that kind of fixed it! Clark from OpenDev talked about that issue during the monthly Gerrit community meeting ( https://docs.google.com/document/d/1QmrIsyBx52Sk_qWwrcci0hI_SlUWMGiwT8Kp05_ioBw/edit ). He referred to https://github.com/apache/mina-sshd/issues/319 which is very detailed.

hashar claimed this task.

Marking solved as far as we are concerned. Upstream will eventually find a proper fix. Please reopen if you still see ssh connection errors and I will dig again ;)

Upstream has another potentially related fix will will be in 3.5.6: https://gerrit-review.googlesource.com/c/gerrit/+/362914. As I get it jgit flushes and close the streams then when sshd tries to flush them it gets an SshChannelClosedException, the fix catch and ignores that exception since jgit already handled that.

Upstream has another potentially related fix will will be in 3.5.6: https://gerrit-review.googlesource.com/c/gerrit/+/362914. As I get it jgit flushes and close the streams then when sshd tries to flush them it gets an SshChannelClosedException, the fix catch and ignores that exception since jgit already handled that.

I have updated our Gerrit to 3.5.6 which includes that fix.