Page MenuHomePhabricator

Upstream connect error on API Portal wiki
Closed, ResolvedPublic


I intermittently get an almost blank page that has the following message at the top when visiting a page on the production API Portal wiki:

{"httpCode":503,"httpReason":"upstream connect error or disconnect/reset before headers. reset reason: connection termination"}

Event Timeline

I've gotten the same error when trying to access the main page a few times. It usually resolves itself on refresh.

Some logs:

[2020-09-10 20:01:03.438][69][debug][pool] [source/common/conn_pool/] [C13338] using existing connection
[2020-09-10 20:01:03.438][69][debug][pool] [source/common/conn_pool/] [C13338] creating stream
[2020-09-10 20:01:03.438][69][debug][router] [source/common/router/] [C13917][S16695905485085360567] pool ready
[2020-09-10 20:01:03.439][69][debug][connection] [source/extensions/transport_sockets/tls/] [C13338] 
[2020-09-10 20:01:03.439][69][debug][connection] [source/common/network/] [C13338] remote close
[2020-09-10 20:01:03.439][69][debug][connection] [source/common/network/] [C13338] closing socket: 0
[2020-09-10 20:01:03.439][69][debug][connection] [source/extensions/transport_sockets/tls/] [C13338] SSL shutdown: rc=-1
[2020-09-10 20:01:03.439][69][debug][connection] [source/extensions/transport_sockets/tls/] [C13338] 
[2020-09-10 20:01:03.439][69][debug][client] [source/common/http/] [C13338] disconnect. resetting 1 pending requests
[2020-09-10 20:01:03.439][69][debug][client] [source/common/http/] [C13338] request reset
[2020-09-10 20:01:03.439][69][debug][router] [source/common/router/] [C13917][S16695905485085360567] upstream reset: reset reason connection termination
[2020-09-10 20:01:03.439][69][debug][http] [source/common/http/] [C13917][S16695905485085360567] Sending local reply with details upstream_reset_before_response_started{connection termination}
[2020-09-10 20:01:03.439][69][debug][http] [source/common/http/] [C13917][S16695905485085360567] encoding headers via codec (end_stream=false):
':status', '503'
'content-length', '128'
'content-type', 'application/json'
'date', 'Thu, 10 Sep 2020 20:01:03 GMT'
'server', 'envoy'

So, envoy is trying to reuse connection to MW. We can set the max_requests_per_connection to 1 for requests going to MW clusters. @Joe is there a way to reuse connections to MW? I believe not...

Change 627322 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] api-gateway: disable connection reuse

I don't understand SSL shutdown: rc=-1 - doesn't look like anyone has ever hit this before. Weirdly _should_ be reporting details about an error

That error in turn comes from openssl's SSL_shutdown failing:

-1  The shutdown was not successful because a fatal error occurred
      either at the protocol level or a connection failure occurred. It
      can also occur if action is need to continue the operation for non-
      blocking BIOs.  Call SSL_get_error(3) with the return value ret to
      find out the reason.

Getting a consistent reproduction on this would be really handy :(

My guess is that the empty message happens because this loop never gets an iteration, hence the empty string.

Change 627559 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/deployment-charts@master] api-gateway: limit persistent connections to attempt to address SSL issues.

Change 627322 abandoned by Hnowlan:
[operations/deployment-charts@master] api-gateway: disable connection reuse


Change 627559 merged by jenkins-bot:
[operations/deployment-charts@master] api-gateway: limit persistent connections to attempt to address SSL issues.

I'm still seeing this error in the logs, which is concerning. However, less concerning is not seeing any more active connections being closed:

I'm not sure if this is resolved

Based on the graph linked above and the lack of reports I'm going to close this for now - if anyone sees this again please reopen and let me know a rough timestamp!

Thanks, Hugh! I haven't seen the error recently, but I'll let you know if it comes up again.