Page MenuHomePhabricator

Labstore nfsd processes report "sent only x when sending y bytes - shutting down socket"
Closed, ResolvedPublic

Description

Since rebooting labstore1004 and 1005 to the new kernel 4.9.0-0.bpo.3-amd64, nfsd processes report these errors on dmesg

[18792.598194] nfsd: last server has exited, flushing export cache
[19638.318454] rpc-srv/tcp: nfsd: sent only 390960 when sending 659520 bytes - shutting down socket
[20144.705432] rpc-srv/tcp: nfsd: sent only 1006136 when sending 1048640 bytes - shutting down socket
[21010.843795] rpc-srv/tcp: nfsd: sent only 387304 when sending 1048640 bytes - shutting down socket
[21011.066691] nfsd: last server has exited, flushing export cache
[21075.544262] nfsd: last server has exited, flushing export cache
[21835.538290] rpc-srv/tcp: nfsd: sent only 256984 when sending 1048640 bytes - shutting down socket
[25414.602780] rpc-srv/tcp: nfsd: sent only 57408 when sending 949932 bytes - shutting down socket

@chasemp Adjusted the following settings in /proc/sys/net/core/ to see if it'll help.

sysctl -w net.core.rmem_default=624288
sysctl -w net.core.wmem_default=624288
sysctl -w net.core.netdev_max_backlog=8000

The number of errors have seem to have gone down at-least anecdotally, but we are not sure.

Event Timeline

Quite likely related to T169290: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS. @chasemp could not find other log events like this prior to the kernel upgrade.

Quite likely related to T169290: New anti-stackclash (4.9.25-1~bpo8+3 ) kernel super bad for NFS. @chasemp could not find other log events like this prior to the kernel upgrade.

Doesn't seem to have shown up in the last 3 days on the current active with Linux labstore1004 4.4.0-3-amd64 #1 SMP Debian 4.4.2-3+wmf8 (2016-12-22) x86_64 GNU/Linux

This is happening periodically:

Cnt Month Day Hour
  2 Mar 20 05
  2 Mar 20 03
 22 Mar 20 02 <<
 20 Mar 20 01 <<
  5 Mar 20 00 <<
  1 Mar 19 08
  2 Mar 19 03
 21 Mar 19 02 <<
 18 Mar 19 01 <<
  5 Mar 19 00 <<
  2 Mar 18 11
  1 Mar 18 03
 23 Mar 18 02 <<
 17 Mar 18 01 <<
  2 Mar 18 00 <<
  2 Mar 17 14
 12 Mar 17 02 <<
 17 Mar 17 01 <<
 10 Mar 17 00 <<
  1 Mar 16 23
  2 Mar 16 09
  1 Mar 16 08
 15 Mar 16 02 <<
 22 Mar 16 01 <<
  6 Mar 16 00 <<
  2 Mar 15 07
 14 Mar 15 04
 18 Mar 15 03
 15 Mar 15 02 <<
 12 Mar 15 01 <<
  2 Mar 15 00 <<
  2 Mar 14 11
 16 Mar 14 02 <<
 21 Mar 14 01 <<
  6 Mar 14 00 <<
  1 Mar 13 21
  4 Mar 13 15
 25 Mar 13 02 <<
 18 Mar 13 01 <<
  4 Mar 13 00 <<
  2 Mar 12 03
 21 Mar 12 02 <<
 18 Mar 12 01 <<
  2 Mar 12 00 <<

Related T169281

[21835.538290] rpc-srv/tcp: nfsd: sent only 256984 when sending 1048640 bytes - shutting down socket
This error I remember as happening before our last kernel upgrade even--and was investigated to hell when we finally did the upgrade. It may relate to the usage of tc, which relies on packet dropping. I couldn't find any correllation between it and any other behavior. I don't think it is new, and I don't know if this one went away when the kernel was rolled back because I swear I saw it before the last upgrade which was from the kernel this was rolled back to.

Are we seeing [21075.544262] nfsd: last server has exited, flushing export cache coming back? I haven't seen that one.

This may or may not still be happening after merging patches tomorrow.

Bstorm lowered the priority of this task from High to Medium.Feb 25 2020, 5:59 PM
Bstorm claimed this task.

In the course of upgrades, this seems to have gone away. Now there was a deadlock in the past week that cleared itself up, but nonetheless, this error seems gone for now.