Page MenuHomePhabricator

Frequent RST returned by appservers to LVS hosts
Closed, DeclinedPublic

Description

While reviewing a TCP dump for another task, I noticed frequent RSTs returned by appservers to LVS hosts.

This is a tcpdump from mw1261 (currently in the inactive DC) returned to vl1017-eth0.lvs1003.wikimedia.org.:

Screen Shot 2017-04-24 at 12.40.06 PM.png (413×1 px, 224 KB)

Given the frequency and the fact that happens also in the DC not serving live traffic, I'd say that this traffic are Pybal's HTTPS health checks?

Not really urgent but worth to track :)

Event Timeline

BBlack claimed this task.
BBlack subscribed.

I think this is "normal", it's from PyBal IdleConnection monitors (which just open a TCP conn and do no traffic, and eventually result in a RST).

BBlack added a project: PyBal.

hmm, no, it is the HTTPS check, not the IdleConnection one. I wonder why it's RST and not regular close?

Mentioned in SAL (#wikimedia-operations) [2017-05-05T15:18:12Z] <elukey> increase nginx error log verbosity on mw2146 as test for T163674 (correct task)

I can see some interesting logs on mw2146 with error log set to info:

2017/05/05 15:20:53 [info] 7794#7794: *7 client timed out (110: Connection timed out) while SSL handshaking, client: 10.192.17.3, server: 0.0.0.0:443
2017/05/05 15:20:54 [info] 7802#7802: *8 client timed out (110: Connection timed out) while SSL handshaking, client: 10.192.17.6, server: 0.0.0.0:443

Red herring, I found a way to reproduce the problem. I've set up sudo tcpdump -n -v -i lo port 443 in tmux on mw2146 and ran the following requests:

  1. curl -k https://localhost:443/w/api.php --header "Host: en.wikipedia.org" --header "Connection: close"
  1. curl -k https://localhost:443/w/api.php --header "Host: en.wikipedia.org"

The 1) causes a RST, meanwhile 2) does not.

Tried to strace all the nginx processes and this is the relevant part:

[pid 36881] connect(97, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("10.192.16.119")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 36881] epoll_wait(205, {{EPOLLOUT, {u32=3820565456, u64=140079883957200}}, {EPOLLOUT, {u32=3820565217, u64=140079883956961}}}, 512, 60000) = 2
[pid 36881] getsockopt(97, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 36881] writev(97, [{"GET /w/api.php HTTP/1.1\r\nConnect"..., 284}], 1) = 284
[pid 36881] epoll_wait(205, {{EPOLLIN|EPOLLOUT, {u32=3820565217, u64=140079883956961}}}, 512, 180000) = 1
[pid 36881] recvfrom(97, "HTTP/1.1 200 OK\r\nDate: Tue, 09 M"..., 8192, 0, NULL, NULL) = 8192
[pid 36881] setsockopt(91, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 36881] write(91, "\27\3\3\5$3\22\263\2432\36V\177\35\345\255\276*\306\207\262k\313\226I\363`X#\306\275\355"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\230-B;\4kE\5\362\323ue]\23\311\34\341k\307\202\256\343\260\321\311\321o"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$jU)\207$\36\326\263G1\10\342\373\317`{N\341:\355\10q5q\252\6]"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$^\16\10\177K\221\21\267\217X\v\367\246\23Qh\233\323T\307\324\326!:\320\224\201"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\254\223\274R\233\10~\373\4um\372H\231\260\260\363\334\16\311\2222\r\231u\27\322"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\235\34\325\323\332\1\216\251\30\231\254\251\203\345\"\213\n\213n\r>\226\33\332\273\224\221"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\1v\226\324\245\25\232\5\307\r&Y\205**\25a1R\362\275n\231\343\375\27?\336\371"..., 379) = 379
[pid 36881] recvfrom(97, "set to the same value. For more "..., 8192, 0, NULL, NULL) = 8192
[pid 36881] write(91, "\27\3\3\5$\353u;\231\313q\357\234-m\266\30\322~b\366\220\326\237\177\244\243\365\307\324\230J"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$r\354\313IaES\17\233\250_\25I\17\354l\300\16\177v^\336\261q\211\4\333"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\350a\373*\n\227\324\34W\335[\306R\237\304\332\357\260@@,\364\226&\2506\315"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\213w)\272x\211\217\17\307\n\304\314\336\250y\303OU\247e\374\321\267\213#V\345"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$eIv\363\316%<\363qZ\326\2208\236\233*\310\34\23\nR\253\305\200\374\271;"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$H\213\317W?\5\347\223\331\1\334\264p\0332\241u\356\310\10'\314R\371$p\303"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\1\220W\352\0-\210\6\376\360\267\233\374\33\230\222\341q\331n%\32\24H\205\206\306\276A"..., 405) = 405
[pid 36881] recvfrom(97, "riagestats\"><span dir=\"ltr\" lang"..., 8192, 0, NULL, NULL) = 8192
[pid 36881] write(91, "\27\3\3\5$\177\10\257\31d\214\303p\335\300I4\t\37\266\366\332\\\1\363\260\22[7\320\330\372"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\21WU\rfV\3153\23\332'\336\323\312\355L\303\273\10$\327\225\364yfEC"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\243\220*q\326\213}\353\326\37I\2445\262\220\244\363\376\24\r_;Q\240\242;f"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\307\6.cPB\334\22\20\225\203\230\262j\340!\220b\37\361RB\1MtS "..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\3650ja\227=\276\330\2435D\273\246\31\231\5\320juG\224p%\f\10\27\256"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$g<\213\306\253\363\271d\2\232q\227\315\34\272\240\243\235\250\17\232f;\20\27\262;"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\1\220\227\371\3702=\3605\24\2D\262\242\375C2\3015\357v\374\f\252\235\340R\33>"..., 405) = 405
[pid 36881] recvfrom(97, "uest in the results.\n</p></dd><d"..., 8192, 0, NULL, NULL) = 8192
[pid 36881] write(91, "\27\3\3\5$S\204\264\"\323\205\212n\342\360\5\20\334\33C6\0107\300\252s\207\225H\302\vx"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$S9\217\333\252lx\321\3078lO1\330\311^s\232\t\211S\227\17\224\20s\30"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$4\272\363KG\355t\22R\327T\26\274\213\335<\v\0rk\324!W\240\272'\364"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$\347;]n\372\10P\313\314\263\333\244\265\334\260\37,\260\341\376~\201\321\272\327C\203"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$c\314F/\200\32\316%\246\347o\226Z\356\350\327&u\352\212@\6\6DO\21A"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\5$f\240=\21\256MC\214\307\225\35\207\264\351\213S\16\222p\243\22443M\277\1\356"..., 1321) = 1321
[pid 36881] write(91, "\27\3\3\1\220\306q\20q\253\3\366q\24\270\211\322'\216L\376\2\201<\330oa\10\221Zr\240"..., 405) = 405
[pid 36881] recvfrom(97, "ue2</kbd>. If a value must conta"..., 8192, 0, NULL, NULL) = 1293
[pid 36881] write(91, "\27\3\3\5\17;ny\30\3345\361\212\325\333#\311\26_\302m>\362\243\341\261\32\332\26c\225\243"..., 1300) = 1300
[pid 36881] close(97)                   = 0
[pid 36881] write(89, "10.192.17.3 - - [09/May/2017:16:"..., 106) = 106
[pid 36881] write(91, "\25\3\3\0\22\224\1L\365\10*\320\223\337\365\332\314QZ\263\320o}", 23) = 23
[pid 36881] close(91)                   = 0
[pid 36881] epoll_wait(205,  <unfinished ...>
[pid 36879] <... epoll_wait resumed> {}, 512, 28441) = 0
[pid 36879] close(91)

No SO_LINGER set to 0 followed by a close(), I thought it might have been an Nginx behavior but it doesn't seem the case.

You might want to look at the other side of the nginx proxy as well. Perhaps apache is terminating its connection to the local nginx with RST, and this causes nginx's proxy in turn to RST upstream to the actual client?

Checked as well, thanks for the pointer! tcpdump -i lo doesn't show any RST for apache.

Those RSTs are the result of a packet being sent to an already closed socket. In the tcpdump pasted above, nginx sends the FIN, ACK packet and then probably closes the socket without going through the usual phases described in http://www.tcpipguide.com/free/t_TCPConnectionTermination-2.htm. Then the client (curl) tries to send its own FIN and gets back an RST. That's supported by the strace output as well where I don't see a shutdown(2) call.

At least that's my reading. It does seem to be HTTPS only btw. The same trick on plaintext HTTP did not generate an RST, so maybe there's a different in nginx SSL handling of sockets.

Compared the strace of two requests, one with Connection close and one without it.

Something interesting came up:

With Connection: close

[pid  4431] recvfrom(89, "ions\"><div class=\"apihelp-block-"..., 8192, 0, NULL, NULL) = 4488
[pid  4431] write(3, "\27\3\3\5$u\267\305\275\376\374P\30\3547\t\257O\351\205>/O\256\247\263\321_\367\230\274\24"..., 1321) = 1321
[pid  4431] write(3, "\27\3\3\5$\375\253w\353\177\317\202\263\301\206x&+dV(W\222\253\370\324p\350\267\343\243F"..., 1321) = 1321
[pid  4431] write(3, "\27\3\3\5$\357\362l\320\26\242\326$\211\256\332\263F\272\203F@\327\261\33/\320\25\313:h\20"..., 1321) = 1321
[pid  4431] write(3, "\27\3\3\2U\333,O\211\340q\237>\203\306\310D\362\371c\256\254!2\2000\3749p\300E8"..., 602) = 602
[pid  4431] close(89)                   = 0
[pid  4431] write(3, "\27\3\3\0\25\221s\r$4\0048\300\255\362)\317\321\230\37\22\271\364N\275.", 26) = 26
[pid  4431] write(170, "::1 - - [11/May/2017:16:04:49 +0"..., 91) = 91
[pid  4431] write(3, "\25\3\3\0\22\374M\357N0X\32>\276\225C\325\177\211\370\20\24\250", 23) = 23
[pid  4431] close(3)                    = 0
[pid  4431] epoll_wait(231,

vs

Without Connection: close

[pid  4432] recvfrom(89, "ions\"><div class=\"apihelp-block-"..., 8192, 0, NULL, NULL) = 4488
[pid  4432] write(3, "\27\3\3\5$)W\341R\256\374\221\324\322\365\10\353j\267\321H\311;j\323\317bi\267\217\315`"..., 1321) = 1321
[pid  4432] write(3, "\27\3\3\5$\351\373\257n\223\264p\226\302\25\207\f\315,\336\235b\37<\36\322\24T\246\210\252@"..., 1321) = 1321
[pid  4432] write(3, "\27\3\3\5$\212\0374\246\225\341'ub\270\24^-\n\374(\245J\241\205XTa?\t\352\\"..., 1321) = 1321
[pid  4432] write(3, "\27\3\3\2U< \253\231T\216\204\265|\320y\350\321U\275\7$\243\23)(5\354*O\376\246"..., 602) = 602
[pid  4432] close(89)                   = 0
[pid  4432] write(3, "\27\3\3\0\25\35:!q\333\314\271\216g\355c\320\7\274{\365\342>JBR", 26) = 26
[pid  4432] write(170, "::1 - - [11/May/2017:16:07:39 +0"..., 91) = 91
[pid  4432] epoll_wait(233, {{EPOLLIN|EPOLLOUT, {u32=2974520032, u64=140383980589792}}}, 512, 60000) = 1
[pid  4432] read(3, "\25\3\3\0\22b\233-\16\353\33\5\326\306tJ\264\\+\237\310GH", 16709) = 23
[pid  4432] close(3)

There is a epoll_wait in the last use case that leads to another read(3) before close(3), that is probably what prevents the RST from happening.

Two snipped of nginx error log set to debug, from captured from the close http upstream connection event onwards:

With Connection: close

2017/05/12 06:52:56 [debug] 495#495: *33 close http upstream connection: 130
2017/05/12 06:52:56 [debug] 495#495: *33 free: 00000000011F14D0, unused: 48
2017/05/12 06:52:56 [debug] 495#495: *33 event timer del: 130: 1494572156820
2017/05/12 06:52:56 [debug] 495#495: *33 reusable connection: 0
2017/05/12 06:52:56 [debug] 495#495: *33 http output filter "/w/api.php?"
2017/05/12 06:52:56 [debug] 495#495: *33 http copy filter: "/w/api.php?"
2017/05/12 06:52:56 [debug] 495#495: *33 http postpone filter "/w/api.php?" 00007FFEE31DA7D0
2017/05/12 06:52:56 [debug] 495#495: *33 http chunk: 0
2017/05/12 06:52:56 [debug] 495#495: *33 write new buf t:0 f:0 000000000121D9A8, pos 00000000004BC43B, size: 5 file: 0, size: 0
2017/05/12 06:52:56 [debug] 495#495: *33 http write filter: l:1 f:0 s:5
2017/05/12 06:52:56 [debug] 495#495: *33 http write filter limit 0
2017/05/12 06:52:56 [debug] 495#495: *33 SSL buf copy: 5
2017/05/12 06:52:56 [debug] 495#495: *33 SSL to write: 5
2017/05/12 06:52:56 [debug] 495#495: *33 SSL_write: 5
2017/05/12 06:52:56 [debug] 495#495: *33 http write filter 0000000000000000
2017/05/12 06:52:56 [debug] 495#495: *33 http copy filter: 0 "/w/api.php?"
2017/05/12 06:52:56 [debug] 495#495: *33 http finalize request: 0, "/w/api.php?" a:1, c:1
2017/05/12 06:52:56 [debug] 495#495: *33 http request count:1 blk:0
2017/05/12 06:52:56 [debug] 495#495: *33 http close request
2017/05/12 06:52:56 [debug] 495#495: *33 http log handler
2017/05/12 06:52:56 [debug] 495#495: *33 free: 00000000011E2720
2017/05/12 06:52:56 [debug] 495#495: *33 free: 00000000011E0700, unused: 11
2017/05/12 06:52:56 [debug] 495#495: *33 free: 00000000011E1710, unused: 6
2017/05/12 06:52:56 [debug] 495#495: *33 free: 000000000121D510, unused: 2629
2017/05/12 06:52:56 [debug] 495#495: *33 close http connection: 129
2017/05/12 06:52:56 [debug] 495#495: *33 SSL_shutdown: 1
2017/05/12 06:52:56 [debug] 495#495: *33 reusable connection: 0
2017/05/12 06:52:56 [debug] 495#495: *33 free: 000000000121E520
2017/05/12 06:52:56 [debug] 495#495: *33 free: 00000000011F10C0
2017/05/12 06:52:56 [debug] 495#495: *33 free: 000000000120A0B0, unused: 0
2017/05/12 06:52:56 [debug] 495#495: *33 free: 00000000011F0EB0, unused: 320

Without Connection: close:

2017/05/12 06:54:54 [debug] 509#509: *43 close http upstream connection: 124                                                                                                 [0/1807]
2017/05/12 06:54:54 [debug] 509#509: *43 free: 00000000011F14D0, unused: 48
2017/05/12 06:54:54 [debug] 509#509: *43 event timer del: 124: 1494572274137
2017/05/12 06:54:54 [debug] 509#509: *43 reusable connection: 0
2017/05/12 06:54:54 [debug] 509#509: *43 http output filter "/w/api.php?"
2017/05/12 06:54:54 [debug] 509#509: *43 http copy filter: "/w/api.php?"
2017/05/12 06:54:54 [debug] 509#509: *43 http postpone filter "/w/api.php?" 00007FFEE31DA7D0
2017/05/12 06:54:54 [debug] 509#509: *43 http chunk: 0
2017/05/12 06:54:54 [debug] 509#509: *43 write new buf t:0 f:0 000000000121D998, pos 00000000004BC43B, size: 5 file: 0, size: 0
2017/05/12 06:54:54 [debug] 509#509: *43 http write filter: l:1 f:0 s:5
2017/05/12 06:54:54 [debug] 509#509: *43 http write filter limit 0
2017/05/12 06:54:54 [debug] 509#509: *43 SSL buf copy: 5
2017/05/12 06:54:54 [debug] 509#509: *43 SSL to write: 5
2017/05/12 06:54:54 [debug] 509#509: *43 SSL_write: 5
2017/05/12 06:54:54 [debug] 509#509: *43 http write filter 0000000000000000
2017/05/12 06:54:54 [debug] 509#509: *43 http copy filter: 0 "/w/api.php?"
2017/05/12 06:54:54 [debug] 509#509: *43 http finalize request: 0, "/w/api.php?" a:1, c:1
2017/05/12 06:54:54 [debug] 509#509: *43 set http keepalive handler
2017/05/12 06:54:54 [debug] 509#509: *43 http close request
2017/05/12 06:54:54 [debug] 509#509: *43 http log handler
2017/05/12 06:54:54 [debug] 509#509: *43 free: 00000000011E2720
2017/05/12 06:54:54 [debug] 509#509: *43 free: 00000000011E0700, unused: 0
2017/05/12 06:54:54 [debug] 509#509: *43 free: 00000000011E1710, unused: 0
2017/05/12 06:54:54 [debug] 509#509: *43 free: 000000000121D510, unused: 2645
2017/05/12 06:54:54 [debug] 509#509: *43 free: 00000000011F10C0
2017/05/12 06:54:54 [debug] 509#509: *43 hc free: 0000000000000000 0
2017/05/12 06:54:54 [debug] 509#509: *43 hc busy: 0000000000000000 0
2017/05/12 06:54:54 [debug] 509#509: *43 free: 000000000121E520
2017/05/12 06:54:54 [debug] 509#509: *43 reusable connection: 1
2017/05/12 06:54:54 [debug] 509#509: *43 event timer add: 122: 60000:1494572154290
2017/05/12 06:54:54 [debug] 509#509: *43 http keepalive handler
2017/05/12 06:54:54 [debug] 509#509: *43 malloc: 00000000011F10C0:1024
2017/05/12 06:54:54 [debug] 509#509: *43 SSL_read: 0
2017/05/12 06:54:54 [debug] 509#509: *43 SSL_get_error: 6
2017/05/12 06:54:54 [debug] 509#509: *43 peer shutdown SSL cleanly
2017/05/12 06:54:54 [info] 509#509: *43 client ::1 closed keepalive connection
2017/05/12 06:54:54 [debug] 509#509: *43 close http connection: 122
2017/05/12 06:54:54 [debug] 509#509: *43 SSL_shutdown: 1
2017/05/12 06:54:54 [debug] 509#509: *43 event timer del: 122: 1494572154290
2017/05/12 06:54:54 [debug] 509#509: *43 reusable connection: 0
2017/05/12 06:54:54 [debug] 509#509: *43 free: 00000000011F10C0
2017/05/12 06:54:54 [debug] 509#509: *43 free: 0000000000000000
2017/05/12 06:54:54 [debug] 509#509: *43 free: 000000000120B620, unused: 0
2017/05/12 06:54:54 [debug] 509#509: *43 free: 00000000011F0EB0, unused: 320

The big difference that I see is that at some point the keep alive handler is added when no Connection: close is sent, probably ending up in:

https://github.com/nginx/nginx/blob/branches/stable-1.10/src/http/ngx_http_request.c#L2545

If this is not done nginx close the connection less gracefully, maybe not giving the time to SSL for a complete shutdown.

Edit: https://wiki.openssl.org/index.php/Manual:SSL_shutdown(3) says the following:

The shutdown procedure consists of 2 steps: the sending of the "close notify" shutdown alert and the reception of the peer's "close notify" shutdown alert. According to the TLS standard, it is acceptable for an application to only send its shutdown alert and then close the underlying connection without waiting for the peer's response (this way resources can be saved, as the process can already terminate or serve another connection).

Ticket closed as won't fix. The main issue is that not all the clients are sending the close notify, and nginx follows what the majority of the browser do.

IIUC in our case Pybal's ProxyFetch sends the close notify, maybe there is a way to disable this behavior. If not I'll just close the task.

I wonder if Chrome (which is the dominant browser now, not MSIE as indicated in that nginx source comment) sends the close notify?

Apache's mod_ssl seems to default to not expecting any response after sending a close notify:

#   SSL Protocol Adjustments:
#   The safe and default but still SSL/TLS standard compliant shutdown
#   approach is that mod_ssl sends the close notify alert but doesn't wait for
#   the close notify alert from client. When you need a different shutdown
#   approach you can use one of the following variables:
#   o ssl-unclean-shutdown:
#     This forces an unclean shutdown when the connection is closed, i.e. no
#     SSL close notify alert is sent or allowed to be received.  This violates
#     the SSL/TLS standard but is needed for some brain-dead browsers. Use
#     this when you receive I/O errors because of the standard approach where
#     mod_ssl sends the close notify alert.
#   o ssl-accurate-shutdown:
#     This forces an accurate shutdown when the connection is closed, i.e. a
#     SSL close notify alert is sent and mod_ssl waits for the close notify
#     alert of the client. This is 100% SSL/TLS standard compliant, but in
#     practice often causes hanging connections with brain-dead browsers. Use
#     this only for browsers where you know that their SSL implementation
#     works correctly. 
#   Notice: Most problems of broken clients are also related to the HTTP
#   keep-alive facility, so you usually additionally want to disable
#   keep-alive for those clients, too. Use variable "nokeepalive" for this.
#   Similarly, one has to force some clients to use HTTP/1.0 to workaround
#   their broken HTTP/1.1 implementation. Use variables "downgrade-1.0" and
#   "force-response-1.0" for this.
BrowserMatch "MSIE [2-5]" \
         nokeepalive ssl-unclean-shutdown \
         downgrade-1.0 force-response-1.0

I'd say that Chrome should follow the trend since openssl states clearly that the client is not required to send the close notify, but we should make some tests to be super sure.

So from the above, apache really has 3 different modes of operation:

  1. default - sends close notify, but does not wait for a matching close notify from client - this causes RST with a client that does send close notify, and may also still break some horrible browsers (MSIE2-5)?
  2. unclean - does not send close notify and does not wait for it either - helps with MSIE2-5, probably causes RST behavior with any client it's enabled for
  3. accurate - sends close notify and waits for client close notify - would work correctly with a client that sends close notify, but causes wasted/hung/timeout connections with browsers that don't send close-notify.

And importantly, apache can vary that behavior after it has seen the User-Agent string to use different approaches for different clients. So if our terminator was apache, we could potentially solve this for most cases with some UA string matching. If I understand the earlier nginx investigation, nginx is hardcoded to only operate the same way as apache's default behavior?

Perhaps we should try to gather up some stats off the our public-facing nginx about which UAs are currently behaving which way? I'm pretty sure we're facing these same RST issues you've found with our public traffic too.

I haven't confirmed manually, but looking at e.g. this varnish-machine-stats: https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?orgId=1&var-server=cp2002&var-datasource=codfw%20prometheus%2Fops&from=now-3h&to=now&refresh=1m if you look at the "TCP connections" graph and "TCP resets and retransmission rate" to the right of it, generally our rate of syn_recv on the left and resets sent on the right are about the same rate, which would seem to indicate most connects end via RST.

Sorry for the late response!

So from the above, apache really has 3 different modes of operation:

  1. default - sends close notify, but does not wait for a matching close notify from client - this causes RST with a client that does send close notify, and may also still break some horrible browsers (MSIE2-5)?
  2. unclean - does not send close notify and does not wait for it either - helps with MSIE2-5, probably causes RST behavior with any client it's enabled for
  3. accurate - sends close notify and waits for client close notify - would work correctly with a client that sends close notify, but causes wasted/hung/timeout connections with browsers that don't send close-notify.

And importantly, apache can vary that behavior after it has seen the User-Agent string to use different approaches for different clients. So if our terminator was apache, we could potentially solve this for most cases with some UA string matching. If I understand the earlier nginx investigation, nginx is hardcoded to only operate the same way as apache's default behavior?

Yes this is my understanding too.

Perhaps we should try to gather up some stats off the our public-facing nginx about which UAs are currently behaving which way? I'm pretty sure we're facing these same RST issues you've found with our public traffic too.

Definitely, maybe we could grab some IPs that get RSTs and compare them with webrequest logs data (not particularly precise but it could give us some info). I suspect that bots using software like Twisted (that IIUC is used by Pybal and currently sends the close notify) might play a big role in the RSTs that we are observing in the varnish caches.

ayounsi raised the priority of this task from Medium to Needs Triage.Jul 12 2017, 7:49 PM
ayounsi triaged this task as Medium priority.
ayounsi moved this task from Next quarter to Watching on the netops board.

Bump, I had to re-dig into this ticket a bit to catch myself up, so, re-summarizing:

  1. In src/http/ngx_http_request.c at the top of ngx_http_ssl_handshake_handler() there's this little bit of code:
if (c->ssl->handshaked) {

    /*
     * The majority of browsers do not send the "close notify" alert.
     * Among them are MSIE, old Mozilla, Netscape 4, Konqueror,
     * and Links.  And what is more, MSIE ignores the server's alert.
     *
     * Opera and recent Mozilla send the alert.
     */

    c->ssl->no_wait_shutdown = 1;

This causes fixed-at-compile-time nginx behavior where it sends SSL-level close_notify and then immediately closes the socket, not waiting for the client's SSL-level close_notify. If the client sends it anyways, this causes nginx to send back a RST. Nginx claims this is the best way because the "majority" of clients never send the close_notify and it's pointless to wait on it.

  1. Our stats indicate our RSTs and SYNs are near the same rate on our public terminators, meaning that at least for us, it's entirely possible the majority of clients behave properly and send us close_notify, resulting in a bunch of pointless outbound RST traffic from this. At the very least, it might the case that at least most of our internal traffic is compliant and sends close_notify.
  1. We need to do some sniffing and analysis and figure out what the most-popular modern clients actually do (e.g. modern Chrome, Safari, MSIE 11/Edge). If analysis supports the idea that most modern clients are sending close_notify, we should do a local patch to test/fix this issue (maybe make it configurable?), and then perhaps push back towards upstream nginx if it makes sense. Nginx said "If there are reasons to assume this comment is no longer true, the code may be reconsidered. It seems to be still true though." in https://trac.nginx.org/nginx/ticket/1270#ticket . Maybe the comment is no longer true, at least for some servers...

Change 386195 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/software/nginx@wmf-1.13] new patch: configurable ssl_do_wait_shutdown

https://gerrit.wikimedia.org/r/386195

Addendum to the summary: the rst does not happen in case a HTTP connection explicitly asks to be kept alive, since nginx does not send the close notify until the keep alive timeout is elapsed.

The patch looks very interesting but I'd have a performance question: would it mean that nginx should keep more TCP connections opened hoping for the client to eventually send a close notify (in response to its previous one)? Are those connections going to be tore down after their overall timeouts expire?

would it mean that nginx should keep more TCP connections opened hoping for the client to eventually send a close notify (in response to its previous one)? Are those connections going to be tore down after their overall timeouts expire?

If it all works correctly, yes I'd expect the average lifetime of client TCP connections to get a little longer. For clients that do send close_notify, it's a short time extension in the common case (the time-to-RST before becomes time-to-close). For clients that don't, I'm guessing for keepalive responses nothing really changes, and for "Conn: close" it should still close pretty quickly, just without a RST. It probably does leave us vulnerable to a malicious and/or broken client to hold a connection open until some idle timer hits, but they could've already done this earlier in the connection instead of at SSL shutdown time, regardless.

But all of that being said, if nginx has been unconditionally setting no_wait_shutdown = 1 for years, it's entirely possible that bugs have crept into the waiting code path here, as the code hasn't been exercised commonly for that case...

patch above released with nginx-1.13.6-2+wmf1, so we're capable of experimentation now. Flag isn't turned on anywhere yet.

Just found this nice comment in a mozilla repo (not sure if this is the core of firefox or some old thing):

https://github.com/mozilla/gecko-dev/blob/master/security/nss/lib/ssl/sslsecur.c#L746-L768

/*
 * The TLS 1.2 RFC 5246, Section 7.2.1 says:
 *
 *     Unless some other fatal alert has been transmitted, each party is
 *     required to send a close_notify alert before closing the write side
 *     of the connection.  The other party MUST respond with a close_notify
 *     alert of its own and close down the connection immediately,
 *     discarding any pending writes.  It is not required for the initiator
 *     of the close to wait for the responding close_notify alert before
 *     closing the read side of the connection.
 *
 * The second sentence requires that we send a close_notify alert when we
 * have received a close_notify alert.  In practice, all SSL implementations
 * close the socket immediately after sending a close_notify alert (which is
 * allowed by the third sentence), so responding with a close_notify alert
 * would result in a write failure with the ECONNRESET error.  This is why
 * we don't respond with a close_notify alert.
 *
 * Also, in the unlikely event that the TCP pipe is full and the peer stops
 * reading, the SSL3_SendAlert call in ssl_SecureClose and ssl_SecureShutdown
 * may block indefinitely in blocking mode, and may fail (without retrying)
 * in non-blocking mode.
 */

It looks like that commentary is still valid for even the most-recent Firefox builds. So, we may find that even modern Firefox doesn't send close_notify in the case we're talking about. Open question whether Safari and/or Chrome do (they don' t derive from Gecko).

Found a chrome bug that seems interesting (although not really recent):

https://bugs.chromium.org/p/chromium/issues/detail?id=118366

Checking also the Chromium source code: I found only a reference related to receiving a close notify and be prepared that the other peer will almost surely shutdown the connection right after it, but nothing that indicates what happens when chrome sends the close notify.

Declining for lack of movement and lack of urgency.