Page MenuHomePhabricator

occasional 502 from rcstream seen by pybal
Closed, DeclinedPublic

Description

this has been noticed on pybal by @Joe, occasional 502 for rcstream

lvs1005:~$ sudo grep -i rcs100 /var/log/syslog
Feb  9 07:25:04 lvs1005 pybal[23167]: [streamlb6_80 ProxyFetch] WARN: rcs1001.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.002 s
Feb  9 07:25:04 lvs1005 pybal[23167]: [streamlb6_80] ERROR: Monitoring instance ProxyFetch reports server rcs1001.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 07:25:14 lvs1005 pybal[23167]: [streamlb6_80] INFO: Server rcs1001.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 07:28:34 lvs1005 pybal[23167]: [streamlb6_443 ProxyFetch] WARN: rcs1002.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.014 s
Feb  9 07:28:34 lvs1005 pybal[23167]: [streamlb6_443] ERROR: Monitoring instance ProxyFetch reports server rcs1002.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 07:28:44 lvs1005 pybal[23167]: [streamlb6_443] INFO: Server rcs1002.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 07:55:55 lvs1005 pybal[23167]: [streamlb_80 ProxyFetch] WARN: rcs1001.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.002 s
Feb  9 07:55:55 lvs1005 pybal[23167]: [streamlb_80] ERROR: Monitoring instance ProxyFetch reports server rcs1001.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 07:56:05 lvs1005 pybal[23167]: [streamlb_80] INFO: Server rcs1001.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 07:57:12 lvs1005 pybal[23167]: [streamlb_80 ProxyFetch] WARN: rcs1002.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.002 s
Feb  9 07:57:12 lvs1005 pybal[23167]: [streamlb_80] ERROR: Monitoring instance ProxyFetch reports server rcs1002.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 07:57:22 lvs1005 pybal[23167]: [streamlb_80] INFO: Server rcs1002.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 08:26:46 lvs1005 pybal[23167]: [streamlb_80 ProxyFetch] WARN: rcs1001.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.003 s
Feb  9 08:26:46 lvs1005 pybal[23167]: [streamlb_80] ERROR: Monitoring instance ProxyFetch reports server rcs1001.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 08:26:56 lvs1005 pybal[23167]: [streamlb_80] INFO: Server rcs1001.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 08:28:02 lvs1005 pybal[23167]: [streamlb6_80 ProxyFetch] WARN: rcs1002.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.003 s
Feb  9 08:28:02 lvs1005 pybal[23167]: [streamlb6_80] ERROR: Monitoring instance ProxyFetch reports server rcs1002.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 08:28:12 lvs1005 pybal[23167]: [streamlb6_80] INFO: Server rcs1002.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 08:36:50 lvs1005 pybal[23167]: [streamlb6_443 ProxyFetch] WARN: rcs1002.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.015 s
Feb  9 08:36:50 lvs1005 pybal[23167]: [streamlb6_443] ERROR: Monitoring instance ProxyFetch reports server rcs1002.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 08:37:00 lvs1005 pybal[23167]: [streamlb6_443] INFO: Server rcs1002.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 08:47:51 lvs1005 pybal[23167]: [streamlb6_443 ProxyFetch] WARN: rcs1002.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.015 s
Feb  9 08:47:51 lvs1005 pybal[23167]: [streamlb6_443] ERROR: Monitoring instance ProxyFetch reports server rcs1002.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 08:48:01 lvs1005 pybal[23167]: [streamlb6_443] INFO: Server rcs1002.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 09:04:23 lvs1005 pybal[23167]: [streamlb_80 ProxyFetch] WARN: rcs1002.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.002 s
Feb  9 09:04:23 lvs1005 pybal[23167]: [streamlb_80] ERROR: Monitoring instance ProxyFetch reports server rcs1002.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 09:04:33 lvs1005 pybal[23167]: [streamlb_80] INFO: Server rcs1002.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 09:22:57 lvs1005 pybal[23167]: [streamlb_80 ProxyFetch] WARN: rcs1001.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.002 s
Feb  9 09:22:57 lvs1005 pybal[23167]: [streamlb_80] ERROR: Monitoring instance ProxyFetch reports server rcs1001.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 09:23:07 lvs1005 pybal[23167]: [streamlb_80] INFO: Server rcs1001.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 09:46:14 lvs1005 pybal[23167]: [streamlb6_80 ProxyFetch] WARN: rcs1002.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.002 s
Feb  9 09:46:14 lvs1005 pybal[23167]: [streamlb6_80] ERROR: Monitoring instance ProxyFetch reports server rcs1002.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 09:46:24 lvs1005 pybal[23167]: [streamlb6_80] INFO: Server rcs1002.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 09:52:33 lvs1005 pybal[23260]: [streamlb6_443] INFO: New enabled server rcs1001.eqiad.wmnet, weight 10
Feb  9 09:52:33 lvs1005 pybal[23260]: [streamlb6_443] INFO: New enabled server rcs1002.eqiad.wmnet, weight 10
Feb  9 09:52:33 lvs1005 pybal[23260]: [streamlb_443] INFO: New enabled server rcs1001.eqiad.wmnet, weight 10
Feb  9 09:52:33 lvs1005 pybal[23260]: [streamlb_443] INFO: New enabled server rcs1002.eqiad.wmnet, weight 10
Feb  9 09:52:33 lvs1005 pybal[23260]: [streamlb6_80] INFO: New enabled server rcs1001.eqiad.wmnet, weight 10
Feb  9 09:52:33 lvs1005 pybal[23260]: [streamlb6_80] INFO: New enabled server rcs1002.eqiad.wmnet, weight 10
Feb  9 09:52:33 lvs1005 pybal[23260]: [streamlb_80] INFO: New enabled server rcs1001.eqiad.wmnet, weight 10
Feb  9 09:52:33 lvs1005 pybal[23260]: [streamlb_80] INFO: New enabled server rcs1002.eqiad.wmnet, weight 10
Feb  9 09:54:53 lvs1005 pybal[23260]: [streamlb6_443 ProxyFetch] WARN: rcs1001.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.016 s
Feb  9 09:54:53 lvs1005 pybal[23260]: [streamlb6_443] ERROR: Monitoring instance ProxyFetch reports server rcs1001.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway
Feb  9 09:55:03 lvs1005 pybal[23260]: [streamlb6_443] INFO: Server rcs1001.eqiad.wmnet (enabled/partially up/not pooled) is up
Feb  9 10:11:25 lvs1005 pybal[23260]: [streamlb6_443 ProxyFetch] WARN: rcs1001.eqiad.wmnet (enabled/up/pooled): Fetch failed, 0.011 s
Feb  9 10:11:25 lvs1005 pybal[23260]: [streamlb6_443] ERROR: Monitoring instance ProxyFetch reports server rcs1001.eqiad.wmnet (enabled/up/pooled) down: 502 Bad Gateway

and indeed corresponding errors from nginx in "no live upstreams" class

2016/02/09 10:07:01 [error] 1473#0: *5439053 no live upstreams while connecting to upstream, client: 10.68.18.20, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455012421049-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455012421049-0", host: "stream.wikimedia.org"
2016/02/09 10:08:06 [error] 1473#0: *5440473 no live upstreams while connecting to upstream, client: 10.68.17.255, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455012486815-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455012486815-0", host: "stream.wikimedia.org"
2016/02/09 10:09:13 [error] 1472#0: *5443557 no live upstreams while connecting to upstream, client: 10.68.17.113, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455012553560-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455012553560-0", host: "stream.wikimedia.org"
2016/02/09 10:10:19 [error] 1472#0: *5445025 no live upstreams while connecting to upstream, client: 10.68.17.253, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455012619003-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455012619003-0", host: "stream.wikimedia.org"
2016/02/09 10:12:31 [error] 1473#0: *5446565 no live upstreams while connecting to upstream, client: 10.68.17.113, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455012751128-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455012751128-0", host: "stream.wikimedia.org"
2016/02/09 10:13:37 [error] 1471#0: *5449166 no live upstreams while connecting to upstream, client: 10.68.17.166, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455012817306-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455012817306-0", host: "stream.wikimedia.org"
2016/02/09 10:14:43 [error] 1471#0: *5449255 no live upstreams while connecting to upstream, client: 10.68.17.113, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455012883573-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455012883573-0", host: "stream.wikimedia.org"
2016/02/09 10:15:50 [error] 1471#0: *5451540 no live upstreams while connecting to upstream, client: 10.68.18.20, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455012950121-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455012950121-0", host: "stream.wikimedia.org"
2016/02/09 10:16:55 [error] 1471#0: *5451409 no live upstreams while connecting to upstream, client: 10.68.16.38, server: _, request: "GET /socket.io/?EIO=3&transport=polling&t=1455013015819-0 HTTP/1.1", upstream: "http://websockets/socket.io/?EIO=3&transport=polling&t=1455013015819-0", host: "stream.wikimedia.org"

plus some from "upstream prematurely closed connection" (client ip is always the same)

2016/02/09 10:18:01 [error] 1473#0: *5454852 upstream prematurely closed connection while reading response header from upstream, client: REDACTED, server: _, request: "GET /socket.io/1/websocket/319727471623 HTTP/1.1", upstream: "http://127.0.0.1:10092/socket.io/1/websocket/319727471623", host: "stream.wikimedia.org"
2016/02/09 10:18:01 [error] 1473#0: *5454852 upstream prematurely closed connection while reading response header from upstream, client: REDACTED, server: _, request: "GET /socket.io/1/websocket/319727471623 HTTP/1.1", upstream: "http://127.0.0.1:10096/socket.io/1/websocket/319727471623", host: "stream.wikimedia.org"
2016/02/09 10:18:01 [error] 1473#0: *5454852 upstream prematurely closed connection while reading response header from upstream, client: REDACTED, server: _, request: "GET /socket.io/1/websocket/319727471623 HTTP/1.1", upstream: "http://127.0.0.1:10087/socket.io/1/websocket/319727471623", host: "stream.wikimedia.org"
2016/02/09 10:18:01 [error] 1473#0: *5454852 upstream prematurely closed connection while reading response header from upstream, client: REDACTED, server: _, request: "GET /socket.io/1/websocket/319727471623 HTTP/1.1", upstream: "http://127.0.0.1:10091/socket.io/1/websocket/319727471623", host: "stream.wikimedia.org"
2016/02/09 10:18:01 [error] 1473#0: *5454852 upstream prematurely closed connection while reading response header from upstream, client: REDACTED, server: _, request: "GET /socket.io/1/websocket/319727471623 HTTP/1.1", upstream: "http://127.0.0.1:10085/socket.io/1/websocket/319727471623", host: "stream.wikimedia.org"
2016/02/09 10:18:01 [error] 1473#0: *5454852 upstream prematurely closed connection while reading response header from upstream, client: REDACTED, server: _, request: "GET /socket.io/1/websocket/319727471623 HTTP/1.1", upstream: "http://127.0.0.1:10102/socket.io/1/websocket/319727471623", host: "stream.wikimedia.org"
2016/02/09 10:18:01 [error] 1473#0: *5454852 upstream prematurely closed connection while reading response header from upstream, client: REDACTED, server: _, request: "GET /socket.io/1/websocket/319727471623 HTTP/1.1", upstream: "http://127.0.0.1:10094/socket.io/1/websocket/319727471623", host: "stream.wikimedia.org"
2016/02/09 10:18:01 [error] 1473#0: *5454852 upstream prematurely closed connection while reading response header from upstream, client: REDACTED, server: _, request: "GET /socket.io/1/websocket/319727471623 HTTP/1.1", upstream: "http://127.0.0.1:10093/socket.io/1/websocket/319727471623", host: "stream.wikimedia.org"

it doesn't seem to be overly impacting to users but still worthwhile to be aware of cc @ori @Krinkle

Event Timeline

fgiunchedi raised the priority of this task from to Needs Triage.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added projects: SRE, EventStreams.
fgiunchedi added subscribers: fgiunchedi, Krinkle, Joe, ori.
fgiunchedi triaged this task as Medium priority.Apr 27 2016, 1:19 PM

Declining per RCStream being deprecated and scheduled to be shut down on July 7th.

See also T156919.