Page MenuHomePhabricator

wikibugs SSE client does not see events from gitlab-webhooks in real-time.
Closed, ResolvedPublicBUG REPORT

Description

I am trying to add a server-sent events client to wikibugs to watch for new events from https://gitlab-webhooks.toolforge.org/. When I use curl -v --no-buffer -H 'Accept: text/event-stream' 'https://gitlab-webhooks.toolforge.org/sse/' as a client I see events in real-time as expected. When I try to connect using an asyncio client that is largely copied from the working client that wikibugs uses to connect it's web and irc components however things get strange.

The asyncio appears to hang while completing the initial connection. When I have the patience (or wander away to do something else with it running) the hang seems to eventually clear at least temporarily as noted in these irc rants:

[20:11:24] <bd808>	 My sse magic for getting the webhook events into wikibugs is not playing nice. :/ The endpoint is working great via curl, but my asyncio python client gets lost somewhere in establishing the connection. Very frustrating at the moment.
[21:03:33] <bd808>	 LOL. I left my broken client running in a background window while I worked on other things. 26 minutes after I started it the client finished the initial https handshake and started outputing data. This almost certainly is a problem in my client as the curl test work functionally instantly. Still maddening, but at least showing something more than "it doesn't work".
[21:14:50] <brennen>	 26 minutes is... impressive
[21:16:25] <bd808>	 once it started actually working it seemed be perfectly happy. A second test in the same fashion started working after "only" 9 minutes.
[21:17:26] <bd808>	 It must be something I did wrong in the asyncio loop setup. I'm not sure what else could pause that long before doing the needful

The exact same client code works completely as expected when connected to a local gitlab-webhooks instance running on the same laptop. This makes the problem seem more likely to be somehow related to the two nginx reverse proxies that sit between the client and the https://gitlab-webhooks.toolforge.org/sse/ service. The fact that curl works unimpeded in both localhost and toolforge endpoint tests however makes me cautious about directly implicating nginx.

Details

TitleReferenceAuthorSource BranchDest Branch
ingress-nginx: Pass X-Accel-Buffering header through proxyrepos/cloud/toolforge/toolforge-deploy!275bd808work/bd808/x-accel-bufferingmain
Customize query in GitLab

Event Timeline

bd808 changed the task status from Open to In Progress.Tue, Apr 30, 10:51 PM
bd808 claimed this task.
bd808 created this task.
bd808 moved this task from Backlog to Doing on the Wikibugs board.
bd808 renamed this task from wikibugs SSE client does not seen events from gitlab-webhooks in real-time. to wikibugs SSE client does not see events from gitlab-webhooks in real-time..Tue, Apr 30, 11:10 PM

The issue seems to be related to HTTP/1.1 vs HTTP/2. As previously noted, curl -v --no-buffer -H 'Accept: text/event-stream' 'https://gitlab-webhooks.toolforge.org/sse/' works as expected:

$ curl -v --no-buffer -H 'Accept: text/event-stream' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' 'https://gitlab-webhooks.toolforge.org/sse/'
*   Trying 185.15.56.11:443...
* Connected to gitlab-webhooks.toolforge.org (185.15.56.11) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=toolforge.org
*  start date: Mar 12 08:23:29 2024 GMT
*  expire date: Jun 10 08:23:28 2024 GMT
*  subjectAltName: host "gitlab-webhooks.toolforge.org" matched cert's "*.toolforge.org"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://gitlab-webhooks.toolforge.org/sse/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: gitlab-webhooks.toolforge.org]
* [HTTP/2] [1] [:path: /sse/]
* [HTTP/2] [1] [user-agent: curl/8.4.0]
* [HTTP/2] [1] [accept: text/event-stream]
* [HTTP/2] [1] [cache-control: no-cache]
> GET /sse/ HTTP/2
> Host: gitlab-webhooks.toolforge.org
> User-Agent: curl/8.4.0
> Accept: text/event-stream
> Cache-Control: no-cache
> Connection: keep-alive
>
< HTTP/2 200
< server: nginx/1.22.1
< date: Fri, 03 May 2024 18:29:07 GMT
< content-type: text/event-stream
< cache-control: no-cache
< strict-transport-security: max-age=31622400
< x-clacks-overhead: GNU Terry Pratchett
< permissions-policy: browsing-topics=()
< content-security-policy-report-only: default-src 'self' 'unsafe-eval' 'unsafe-inline' blob: data: filesystem: mediastream: *.toolforge.org wikibooks.org *.wikibooks.org wikidata.org *.wikidata.org wikimedia.org *.wikimedia.org wikinews.org *.wikinews.org wikipedia.org *.wikipedia.org wikiquote.org *.wikiquote.org wikisource.org *.wikisource.org wikiversity.org *.wikiversity.org wikivoyage.org *.wikivoyage.org wiktionary.org *.wiktionary.org *.wmcloud.org *.wmflabs.org wikimediafoundation.org mediawiki.org *.mediawiki.org wss://gitlab-webhooks.toolforge.org; report-uri https://csp-report.toolforge.org/collect;
<
: Initial connect ping
event: keepalive
data: {"keepalive": 7430881.818505518}

As can be seen, this connection negotiated HTTP/2 and returned the expected initial "keepalive" SSE event.

If we force curl to negotiate HTTP/1.1, things look a lot more like the busted python client behavior:

$ curl -v --http1.1 --no-buffer -H 'Accept: text/event-stream' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' 'https://gitlab-webhooks.toolforge.org/sse/'
*   Trying 185.15.56.11:443...
* Connected to gitlab-webhooks.toolforge.org (185.15.56.11) port 443
* ALPN: curl offers http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: CN=toolforge.org
*  start date: Mar 12 08:23:29 2024 GMT
*  expire date: Jun 10 08:23:28 2024 GMT
*  subjectAltName: host "gitlab-webhooks.toolforge.org" matched cert's "*.toolforge.org"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/1.1
> GET /sse/ HTTP/1.1
> Host: gitlab-webhooks.toolforge.org
> User-Agent: curl/8.4.0
> Accept: text/event-stream
> Cache-Control: no-cache
> Connection: keep-alive
>
^C

The aiohttp library that I'm using in the python client does not have HTTP/2 support at all, so I need to either find a different client library or continue to troubleshoot the HTTP/1.1 failure.

The ingess-nginx configuration change proposed at https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/merge_requests/275 seems to fix the buffering issue:

$ curl --http1.1 -v --no-buffer -H 'Accept: text/event-stream' -H 'Cache-Control: no-cache' -H 'Connection: keep-alive' 'https://gitlab-webhooks-beta.beta.toolforge.org/sse/'
*   Trying 185.15.56.33:443...
* Connected to gitlab-webhooks-beta.beta.toolforge.org (185.15.56.33) port 443
* ALPN: curl offers http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: CN=beta.toolforge.org
*  start date: May  3 11:38:24 2024 GMT
*  expire date: Aug  1 11:38:23 2024 GMT
*  subjectAltName: host "gitlab-webhooks-beta.beta.toolforge.org" matched cert's "*.beta.toolforge.org"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* using HTTP/1.1
> GET /sse/ HTTP/1.1
> Host: gitlab-webhooks-beta.beta.toolforge.org
> User-Agent: curl/8.4.0
> Accept: text/event-stream
> Cache-Control: no-cache
> Connection: keep-alive
>
< HTTP/1.1 200
< Server: nginx/1.22.1
< Date: Sat, 04 May 2024 18:02:30 GMT
< Content-Type: text/event-stream
< Transfer-Encoding: chunked
< Connection: keep-alive
< cache-control: no-cache
< Strict-Transport-Security: max-age=31622400
< X-Clacks-Overhead: GNU Terry Pratchett
< Permissions-Policy: browsing-topics=()
< Content-Security-Policy-Report-Only: default-src 'self' 'unsafe-eval' 'unsafe-inline' blob: data: filesystem: mediastream: *.toolforge.org wikibooks.org *.wikibooks.org wikidata.org *.wikidata.org wikimedia.org *.wikimedia.org wikinews.org *.wikinews.org wikipedia.org *.wikipedia.org wikiquote.org *.wikiquote.org wikisource.org *.wikisource.org wikiversity.org *.wikiversity.org wikivoyage.org *.wikivoyage.org wiktionary.org *.wiktionary.org *.wmcloud.org *.wmflabs.org wikimediafoundation.org mediawiki.org *.mediawiki.org wss://gitlab-webhooks-beta.beta.toolforge.org; report-uri https://csp-report.toolforge.org/collect;
<
: Initial connect ping
event: keepalive
data: {"keepalive": 7629441.817741163}

^C

Compare this to the trace in T363877#9768913 where the HTTP/1.1 client connection halted before emitting the "Initial connect ping" event.