Page MenuHomePhabricator

Upstream prematurely closed connection
Closed, DeclinedPublic

Description

While working on an unrelated task related to rcs servers, @BBlack found that the nginx logs are filled with error messages of the type:

2016/12/20 18:16:29 [error] 7170#0: *1110454 upstream prematurely closed connection while reading response header from upstream, client: 10.64.32.103, server: _, request: "GET /socket.io/1/websocket/408610599316 HTTP/1.1", upstream: "http://127.0.0.1:10099/socket.io/1/websocket/408610599316", host: "stream.wikimedia.org"

To give an idea of the error log rate, here the size of the last 2 nginx log file for access and error:

-rw-r-----  1 www-data adm    120M Dec 20 18:17 rcstream_access.log
-rw-r-----  1 www-data adm    584M Dec 19 06:26 rcstream_access.log.1
-rw-r-----  1 www-data adm    3.8G Dec 20 18:17 rcstream_error.log
-rw-r-----  1 www-data adm     19G Dec 19 06:26 rcstream_error.log.1

Looks like rcstream is not properly closing the connection while nginx is expecting a response header.

Event Timeline

Volans created this task.Dec 20 2016, 6:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 20 2016, 6:27 PM
ema added a comment.Dec 21 2016, 2:28 PM

Note that this issue has been present for a very long time and is unrelated to the package upgrade performed yesterday and mentioned in T153773. The earliest occurrence of the message I could find dates back to April this year (/var/log/nginx/rcstream_error.log.34.gz):

2016/04/27 16:13:10 [error] 19124#0: *246120 upstream prematurely closed connection while reading response header from upstream, client: 46.165.210.42, server: _, request: "GET /socket.io/1/websocket/543674504819 HTTP/1.1", upstream: "http://127.0.0.1:10080/socket.io/1/websocket/543674504819", host: "stream.wikimedia.org"

I 've research this a bit more. What seems to happen is that some requests to URLs like /socket.io/1/websocket/695138296539 will get back an empty reply from the python application. Of course that reply is not liked by nginx which will log the above error AND kick the appserver out of the pool, reconnecting a bit later.

I 've tested this locally on rcs1001

akosiaris@rcs1001:/var/log/nginx$ curl -i http://127.0.0.1:10081/socket.io/1/websocket/279924822221
curl: (52) Empty reply from server

which is clearly wrong.

Now, it gets even more interesting. This escalates to a DoS if enough such requests arrive per second as there will be period of time that no appserver will be pooled and nginx will just return a 502 to all clients. This has actually recently been witnessed as there seems to be a set of IPs in Google App Engine that issue a never ending stream of these requests causing the service to intermittently spew 502s.

Doing some tests I 've seen that the the reference socket.io implemention (the one in nodejs) returns an HTTP 400 with a body of {"code":0,"message":"Transport unknown"} meaning it handles this gracefully. rcstream uses https://pypi.python.org/pypi/gevent-socketio which returns that empty reply and is probably abandoned since it hasn't seen a release since 2014 and has a total of 12 commits in 2015, 2016.

Aklapper closed this task as Declined.Jul 7 2017, 3:35 PM

Closing this task as "declined" as RCStream is deprecated and scheduled to be shut down today. See T156919 for more information.