Page MenuHomePhabricator

cas-sso idp for jaeger-ui on k8s
Closed, ResolvedPublic

Description

This task tracks putting the jaeger-ui web interface behind SSO.

Since we've had success with oauth2-proxy to implement stateless OIDC SSO for thanos.w.o we'll be doing the same for trace.wikimedia.org (name TBD, used as placeholder).

The high level I (Filippo) have right now is to do the following:

  • trace.w.o is an ingress service, served by k8s-aux ingress
  • Ingress talks (within the cluster, and securely) the oauth2-proxy sidecar within the jaeger-query pod
  • Said oauth2-proxy is deployed with its OIDC secrets, and redirects the user to SSO as required for authentication
    • The proxy is also configured as an OIDC client in SSO
  • For authenticated requests, oauth2-proxy reverses-proxy (https or http) to the actual jager query/ui

Upstream's jaeger chart already has support for an oauth2-proxy sidecar, we'll have to change its image and make sure it is compatible with our image.

@fgiunchedi and @akosiaris brainstormed a bit on this and since most/all pieces are in place already via ingress + jaeger chart, the idea so far is not to go through the service mesh. Therefore the request path from the internet will look like the following:

client <-- tls --> cdn <-- tls --> ingress <-- tls --> oauth2-proxy <-- tls --> jaeger-query
internet prod k8s network jaeger pod jaeger pod

Next steps
  • Filippo to look into jaeger chart and its oauth2-proxy support
  • Filippo to look into the secrets to be deployed for oauth2-proxy

Details

SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+1 -0
operations/deployment-chartsmaster+1 -1
operations/dnsmaster+1 -0
operations/deployment-chartsmaster+3 -0
operations/puppetproduction+5 -0
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+8 -0
operations/deployment-chartsmaster+3 -1
operations/deployment-chartsmaster+1 -1
operations/docker-images/production-imagesmaster+7 -1
operations/puppetproduction+8 -0
operations/puppetproduction+1 -1
operations/docker-images/production-imagesmaster+7 -1
operations/deployment-chartsmaster+39 -1
labs/privatemaster+7 -0
operations/docker-images/production-imagesmaster+7 -1
operations/puppetproduction+6 -0
operations/docker-images/production-imagesmaster+15 -0
Show related patches Customize query in gerrit

Event Timeline

Change 980817 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] docker_pkg: install convenience symlink

https://gerrit.wikimedia.org/r/980817

Change 980818 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/docker-images/production-images@master] New image: oauth2-proxy

https://gerrit.wikimedia.org/r/980818

Change 980818 merged by Filippo Giunchedi:

[operations/docker-images/production-images@master] New image: oauth2-proxy

https://gerrit.wikimedia.org/r/980818

Change 980817 merged by Filippo Giunchedi:

[operations/puppet@production] docker_pkg: install convenience symlink

https://gerrit.wikimedia.org/r/980817

I think we can avoid TLS on the last leg of the diagram above, we are already in the pod and oauth2-proxy will talk with jaeger-query over localhost anyway. The rest depicts my recollection of our brainstorming and my few notes as pretty well. As 2 extra justifications for skipping using the service mesh:

  • We are using apparently the upstream chart almost as is, avoiding heavy patching locally avoid the need for a long term fork
  • This is arguably a monitoring system, it is conceivable that we want to apply the same rule we do with all monitoring system where they need to continue being functional even in case where critical components of our infrastructure, like the CDN (and arguably the service mesh) are failing.

Change 984139 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/docker-images/production-images@master] oauth2-proxy: use the same configuration as jaeger chart

https://gerrit.wikimedia.org/r/984139

Change 984143 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/deployment-charts@master] jaeger: add oauth2-proxy sidecar

https://gerrit.wikimedia.org/r/984143

I think we can avoid TLS on the last leg of the diagram above, we are already in the pod and oauth2-proxy will talk with jaeger-query over localhost anyway. The rest depicts my recollection of our brainstorming and my few notes as pretty well. As 2 extra justifications for skipping using the service mesh:

  • We are using apparently the upstream chart almost as is, avoiding heavy patching locally avoid the need for a long term fork
  • This is arguably a monitoring system, it is conceivable that we want to apply the same rule we do with all monitoring system where they need to continue being functional even in case where critical components of our infrastructure, like the CDN (and arguably the service mesh) are failing.

Agreed on all counts!

Change 984139 merged by Filippo Giunchedi:

[operations/docker-images/production-images@master] oauth2-proxy: use the same configuration as jaeger chart

https://gerrit.wikimedia.org/r/984139

Change 992699 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[labs/private@master] deployment_server: add dummy oauth2-proxy secrets for jaeger

https://gerrit.wikimedia.org/r/992699

Change 992699 merged by Filippo Giunchedi:

[labs/private@master] deployment_server: add dummy oauth2-proxy secrets for jaeger

https://gerrit.wikimedia.org/r/992699

Change 984143 merged by Filippo Giunchedi:

[operations/deployment-charts@master] jaeger: add oauth2-proxy sidecar

https://gerrit.wikimedia.org/r/984143

Change 994182 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/docker-images/production-images@master] oauth2-proxy: run as nobody or explicit uid

https://gerrit.wikimedia.org/r/994182

Change 994182 merged by Filippo Giunchedi:

[operations/docker-images/production-images@master] oauth2-proxy: run as nobody or explicit uid

https://gerrit.wikimedia.org/r/994182

Change 994665 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] hieradata: use trace.w.o for oidc jaeger

https://gerrit.wikimedia.org/r/994665

Change 994664 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] hieradata: add jaeger config for SSO oidc

https://gerrit.wikimedia.org/r/994664

Change 994665 abandoned by Filippo Giunchedi:

[operations/puppet@production] hieradata: use trace.w.o for oidc jaeger

Reason:

Obsolete

https://gerrit.wikimedia.org/r/994665

Change 994986 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/docker-images/production-images@master] oauth2-proxy: add ca-certificates

https://gerrit.wikimedia.org/r/994986

Change 994664 merged by Filippo Giunchedi:

[operations/puppet@production] hieradata: add jaeger config for SSO oidc

https://gerrit.wikimedia.org/r/994664

Change 994986 merged by Filippo Giunchedi:

[operations/docker-images/production-images@master] oauth2-proxy: add ca-certificates

https://gerrit.wikimedia.org/r/994986

Change 995003 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/deployment-charts@master] jaeger: tag oauth2-proxy image

https://gerrit.wikimedia.org/r/995003

Change 995003 merged by jenkins-bot:

[operations/deployment-charts@master] jaeger: tag oauth2-proxy image

https://gerrit.wikimedia.org/r/995003

Change 997789 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/deployment-charts@master] jaeger: route jaeger-query to oauth2-proxy port

https://gerrit.wikimedia.org/r/997789

Change 997789 merged by Filippo Giunchedi:

[operations/deployment-charts@master] jaeger: route trace.w.o to jaeger-query

https://gerrit.wikimedia.org/r/997789

update: I've been poking at ingress/istio after the change above without any luck, current symptom is what looks like a timeout:

cumin1002:~$ curl -H 'Host: trace.wikimedia.org' https://jaeger-query.svc.eqiad.wmnet:30443 -v
* Uses proxy env variable no_proxy == '.wmnet'
*   Trying 10.2.2.78:30443...
* Connected to jaeger-query.svc.eqiad.wmnet (10.2.2.78) port 30443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=jaeger-collector-http.discovery.wmnet
*  start date: Feb  4 14:43:00 2024 GMT
*  expire date: Mar  3 14:43:00 2024 GMT
*  subjectAltName: host "jaeger-query.svc.eqiad.wmnet" matched cert's "jaeger-query.svc.eqiad.wmnet"
*  issuer: C=US; L=San Francisco; O=Wikimedia Foundation, Inc; OU=SRE Foundations; CN=discovery
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x564c76982620)
> GET / HTTP/2
> Host: trace.wikimedia.org
> user-agent: curl/7.74.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 2147483647)!
< HTTP/2 503 
< content-length: 114
< content-type: text/plain
< date: Thu, 15 Feb 2024 13:57:48 GMT
< server: istio-envoy
< 
* Connection #0 to host jaeger-query.svc.eqiad.wmnet left intact
upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection failure

I am digging into ingressgateway logs and found the following upon issuing the curl above:

root@deploy2002:~# kubectl logs -l app=istio-ingressgateway -n istio-system --all-containers -f  | grep trace.wikimedia
{"response_flags":"UF,URX","connection_termination_details":null,"authority":"trace.wikimedia.org","response_code":503,"method":"GET","response_code_details":"upstream_reset_before_response_started{connection_failure}","downstream_remote_address":"10.192.32.7:38402","duration":30019,"downstream_local_address":"10.67.83.217:8443","upstream_transport_failure_reason":null,"protocol":"HTTP/2","request_id":"35d55465-1117-426a-a0db-062c43f91d5e","path":"/","upstream_cluster":"out
bound|16686||main-jaeger-query.jaeger.svc.cluster.local","user_agent":"curl/7.64.0","x_forwarded_for":"10.192.32.7","requested_server_name":"jaeger-collector-http.svc.eqiad.wmnet","upstream_host":"10.67.83.197:4180","bytes_received":0,"bytes_sent":114,"route_name":"default-route","upstream_service_time":null,"start_time":"2024-02-16T10:26:42.341Z","upstream_local_address":null}

Or prettified:

{
  "response_flags": "UF,URX",
  "connection_termination_details": null,
  "authority": "trace.wikimedia.org",
  "response_code": 503,
  "method": "GET",
  "response_code_details": "upstream_reset_before_response_started{connection_failure}",
  "downstream_remote_address": "10.192.32.7:38402",
  "duration": 30019,
  "downstream_local_address": "10.67.83.217:8443",
  "upstream_transport_failure_reason": null,
  "protocol": "HTTP/2",
  "request_id": "35d55465-1117-426a-a0db-062c43f91d5e",
  "path": "/",
  "upstream_cluster": "outbound|16686||main-jaeger-query.jaeger.svc.cluster.local",
  "user_agent": "curl/7.64.0",
  "x_forwarded_for": "10.192.32.7",
  "requested_server_name": "jaeger-collector-http.svc.eqiad.wmnet",
  "upstream_host": "10.67.83.197:4180",
  "bytes_received": 0,
  "bytes_sent": 114,
  "route_name": "default-route",
  "upstream_service_time": null,
  "start_time": "2024-02-16T10:26:42.341Z",
  "upstream_local_address": null
}

Which seems correct to me? i.e. istio is trying to talk to "upstream_host": "10.67.83.197:4180" ?

On the jaeger query side though I can't see requests:

filippo@deploy2002:~$ kubectl logs -l app.kubernetes.io/component=query --all-containers -f
[2024/02/09 17:31:52] [provider.go:55] Performing OIDC Discovery...
[2024/02/09 17:31:53] [proxy.go:89] mapping path "/" => upstream "https://localhost:16686"
[2024/02/09 17:31:53] [oauthproxy.go:166] OAuthProxy configured for OpenID Connect Client ID: jaeger-ui
[2024/02/09 17:31:53] [oauthproxy.go:172] Cookie settings: name:_oauth2_proxy secure(https):true httponly:true expiry:168h0m0s domains:trace.wikimedia.org path:/ samesite: refresh:disabled
{"level":"info","ts":1707499912.824807,"caller":"channelz/funcs.go:340","msg":"[core][Channel #2 SubChannel #3] Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1707499912.8248394,"caller":"channelz/funcs.go:340","msg":"[core][Channel #2 SubChannel #3] Subchannel picks a new address \":16685\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1707499912.8251963,"caller":"app/server.go:282","msg":"Starting HTTP server","port":16686,"addr":":16686"}
{"level":"info","ts":1707499912.825271,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc000525338, {CONNECTING <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1707499912.8253164,"caller":"channelz/funcs.go:340","msg":"[core][Channel #2] Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1707499912.826207,"caller":"channelz/funcs.go:340","msg":"[core][Channel #2 SubChannel #3] Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1707499912.8262506,"caller":"grpclog/component.go:71","msg":"[core]pickfirstBalancer: UpdateSubConnState: 0xc000525338, {READY <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1707499912.8262668,"caller":"channelz/funcs.go:340","msg":"[core][Channel #2] Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"warn","ts":1707865331.3160307,"caller":"tlscfg/cert_watcher.go:92","msg":"Certificate has been removed, using the last known version","certificate":"/tls/tls.key"}
{"level":"warn","ts":1707865331.3167005,"caller":"tlscfg/cert_watcher.go:92","msg":"Certificate has been removed, using the last known version","certificate":"/tls/tls.crt"}

@akosiaris @CDanis does any of the above either ring a bell or show sth likely obvious I am missing? thank you!

Is oauth2-proxy (port 4180) intentionally listening on HTTP and not HTTPS?

taavi@deploy2002 ~ <jaeger/aux-k8s-eqiad> $ kubectl port-forward main-jaeger-query-7f56949ddd-fdl57 4180 &
[1] 14693
taavi@deploy2002 ~ <jaeger/aux-k8s-eqiad> $ Forwarding from 127.0.0.1:4180 -> 4180
Forwarding from [::1]:4180 -> 4180

taavi@deploy2002 ~ <jaeger/aux-k8s-eqiad> $ 
taavi@deploy2002 ~ <jaeger/aux-k8s-eqiad> $ curl localhost:4180
Handling connection for 4180
<a href="https://idp.wikimedia.org/oidc/oidcAuthorize?approval_prompt=force&amp;client_id=jaeger-ui&amp;code_challenge=4t9neTN35X1xh7ImqSgc1Wvm3sY503jsbA_pv0Cnt_xFtwO407_9qV8bCjXy7JEarG4VF9ecMIpD5f-ELJuG0qIb67DaMTOc&amp;code_challenge_method=plain&amp;redirect_uri=https%3A%2F%2Ftrace.wikimedia.org%2Foauth2%2Fcallback&amp;response_type=code&amp;scope=openid+email+profile&amp;state=rl7CT7a3nIf5WPk4SqeZQ7ITcWGS30MdBtImUHq4e6E%3A%2F">Found</a>.

taavi@deploy2002 ~ <jaeger/aux-k8s-eqiad> $ curl https://localhost:4180
Handling connection for 4180
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong versE0216 10:54:01.773755   14693 portforward.go:391] error copying from local connection to remote stream: read tcp6 [::1]:4180->[::1]:48912: read: connection reset by peer

Change 1004239 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] serve https for jaeger-query oauth2-proxy

https://gerrit.wikimedia.org/r/1004239

I've verified that oauth2-proxy will silently just serve plain HTTP if you specify https_address but don't provide it with TLS key material. So I think I've provided it with such in this patch?

I'm also confused as to why we didn't have to change the port number referenced in the default-route of the istio VirtualService for trace.wikimedia.org ?

Change 1004239 merged by Filippo Giunchedi:

[operations/deployment-charts@master] serve https for jaeger-query oauth2-proxy

https://gerrit.wikimedia.org/r/1004239

Change 1005041 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/dns@master] wikimedia.org: add trace

https://gerrit.wikimedia.org/r/1005041

Change 1005043 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] Add trace.w.o to CDN

https://gerrit.wikimedia.org/r/1005043

Thank you @taavi! Definitely that was a problem, which I've verified it is fixed now thanks to @CDanis' patch:

$ curl -v https://localhost:4180 -k
* Uses proxy env variable no_proxy == 'wikipedia.org,wikimedia.org,wikibooks.org,wikinews.org,wikiquote.org,wikisource.org,wikiversity.org,wikivoyage.org,wikidata.org,wikiworkshop.org,wikifunctions.org,wiktionary.org,mediawiki.org,wmfusercontent.org,w.wiki,wmnet,127.0.0.1,::1'
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 4180 (#0)
Handling connection for 4180
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=main-jaeger-query.jaeger.svc
*  start date: Feb 13 22:57:00 2024 GMT
*  expire date: Mar 12 22:57:00 2024 GMT
*  issuer: C=US; L=San Francisco; O=Wikimedia Foundation, Inc; OU=SRE Foundations; CN=discovery
*  SSL certificate verify ok.
> GET / HTTP/1.1
> Host: localhost:4180
> User-Agent: curl/7.64.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/1.1 302 Found
< Cache-Control: no-cache, no-store, must-revalidate, max-age=0
< Content-Type: text/html; charset=utf-8
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< Location: https://idp.wikimedia.org/oidc/oidcAuthorize?approval_prompt=force&client_id=jaeger-ui&code_challenge=GI9kWOsnmCH9eAn1.pIn5y-CK~kA9-ww7fNqDQioOqBPWr~W326MMTau0516TdRO7Kkgg9X.XjvsICtk1j~yz14p7_ayD5Kk&code_challenge_method=plain&redirect_uri=https%3A%2F%2Ftrace.wikimedia.org%2Foauth2%2Fcallback&response_type=code&scope=openid+email+profile&state=K-txYNvO8X8pnUxRfq_KHzc-7C9L0OLUR_y1UYuJhGw%3A%2F
< Set-Cookie: _oauth2_proxy_csrf=4E5nwGoKYxUNdK3GxPXG9zznBL54eoTc0Vf7WHSJ20RDQ6a7reeWrwheGFn87Liu0GAsxCVM534p7u72EnGoxH37MALtnazEPB0SnIsMgT1h6VTGvjslmZbubIdFcPKktHAaDwD4kHEH4SCOW3nX1uzDTcN7LbtpTN_K-ukjh0B7EA9X9n4opKQbo8y0wzE1rNyMtJrkIT19u115eCmY_BZ8SceDTTHaN23U5hp8s1F_mFrnL_tEWzqibW8wGg==|1708435326|Lwh_x1KQ7l4DLjgcH6UMpk62k5Y1w_L9IQyAJGX8Wt8=; Path=/; Domain=trace.wikimedia.org; Expires=Tue, 20 Feb 2024 13:37:06 GMT; HttpOnly; Secure
< X-Accel-Expires: 0
< Date: Tue, 20 Feb 2024 13:22:06 GMT
< Content-Length: 446
< 
<a href="https://idp.wikimedia.org/oidc/oidcAuthorize?approval_prompt=force&amp;client_id=jaeger-ui&amp;code_challenge=GI9kWOsnmCH9eAn1.pIn5y-CK~kA9-ww7fNqDQioOqBPWr~W326MMTau0516TdRO7Kkgg9X.XjvsICtk1j~yz14p7_ayD5Kk&amp;code_challenge_method=plain&amp;redirect_uri=https%3A%2F%2Ftrace.wikimedia.org%2Foauth2%2Fcallback&amp;response_type=code&amp;scope=openid+email+profile&amp;state=K-txYNvO8X8pnUxRfq_KHzc-7C9L0OLUR_y1UYuJhGw%3A%2F">Found</a>.

* Connection #0 to host localhost left intact

I've verified that oauth2-proxy will silently just serve plain HTTP if you specify https_address but don't provide it with TLS key material. So I think I've provided it with such in this patch?

I'm also confused as to why we didn't have to change the port number referenced in the default-route of the istio VirtualService for trace.wikimedia.org ?

My understanding is essentially what I've outlined at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1004239/comment/afe59904_60ddfc01/ although things are still not working (i.e. curl -H 'Host: trace.wikimedia.org' https://jaeger-query.svc.eqiad.wmnet:30443 though I might be getting the test wrong?) and I'm not convinced anymore we shouldn't also change the port

My understanding is essentially what I've outlined at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1004239/comment/afe59904_60ddfc01/ although things are still not working (i.e. curl -H 'Host: trace.wikimedia.org' https://jaeger-query.svc.eqiad.wmnet:30443 though I might be getting the test wrong?) and I'm not convinced anymore we shouldn't also change the port

Try with --connect-to for proper SNI in the request:

taavi@deploy2002 ~ $ curl -v --connect-to ::jaeger-query.svc.eqiad.wmnet https://trace.wikimedia.org:30443
* Uses proxy env variable no_proxy == 'wikipedia.org,wikimedia.org,wikibooks.org,wikinews.org,wikiquote.org,wikisource.org,wikiversity.org,wikivoyage.org,wikidata.org,wikiworkshop.org,wikifunctions.org,wiktionary.org,mediawiki.org,wmfusercontent.org,w.wiki,wmnet,127.0.0.1,::1'
* Connecting to hostname: jaeger-query.svc.eqiad.wmnet
*   Trying 10.2.2.78...
* TCP_NODELAY set
* Connected to jaeger-query.svc.eqiad.wmnet (10.2.2.78) port 30443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=jaeger-collector-http.discovery.wmnet
*  start date: Feb  4 14:43:00 2024 GMT
*  expire date: Mar  3 14:43:00 2024 GMT
*  subjectAltName does not match trace.wikimedia.org
* SSL: no alternative certificate subject name matches target host name 'trace.wikimedia.org'
* Closing connection 0
curl: (60) SSL: no alternative certificate subject name matches target host name 'trace.wikimedia.org'
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Change 1005119 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] jaeger: Add 4180 port to the network policy

https://gerrit.wikimedia.org/r/1005119

Change 1005119 merged by jenkins-bot:

[operations/deployment-charts@master] jaeger: Add 4180 port to the network policy

https://gerrit.wikimedia.org/r/1005119

Change 1005043 merged by CDanis:

[operations/puppet@production] Add trace.w.o to CDN

https://gerrit.wikimedia.org/r/1005043

Change 1005156 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] Ask Ingress to serve trace.wikimedia.org altname

https://gerrit.wikimedia.org/r/1005156

Change 1005156 merged by jenkins-bot:

[operations/deployment-charts@master] Ask Ingress to serve trace.wikimedia.org altname

https://gerrit.wikimedia.org/r/1005156

Change 1005041 merged by CDanis:

[operations/dns@master] wikimedia.org: add trace

https://gerrit.wikimedia.org/r/1005041

Change 1005163 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] jaeger: make oidc client_id match CAS config

https://gerrit.wikimedia.org/r/1005163

Change 1005163 merged by jenkins-bot:

[operations/deployment-charts@master] jaeger: make oidc client_id match CAS config

https://gerrit.wikimedia.org/r/1005163

Change 1005167 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] jaeger: oauth proxy don't verify upstream

https://gerrit.wikimedia.org/r/1005167

Change 1005167 merged by jenkins-bot:

[operations/deployment-charts@master] jaeger: oauth proxy don't verify upstream

https://gerrit.wikimedia.org/r/1005167

Change 1005173 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] actually get the triple negative correct

https://gerrit.wikimedia.org/r/1005173

Change 1005173 merged by jenkins-bot:

[operations/deployment-charts@master] actually get the triple negative correct

https://gerrit.wikimedia.org/r/1005173

Change 1005177 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] jaeger: oauth proxy fix skip_verify config name

https://gerrit.wikimedia.org/r/1005177

Change 1005177 merged by jenkins-bot:

[operations/deployment-charts@master] jaeger: oauth proxy fix skip_verify config name

https://gerrit.wikimedia.org/r/1005177

fgiunchedi closed this task as Resolved.EditedFeb 21 2024, 9:15 AM
fgiunchedi claimed this task.

Calling this done since https://trace.wikimedia.org now is a thing, thank you all involved @akosiaris @CDanis @taavi !

Change 1006996 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/deployment-charts@master] [jaeger] oauth2-proxy doesn't need to authorize

https://gerrit.wikimedia.org/r/1006996

Change 1006996 merged by Filippo Giunchedi:

[operations/deployment-charts@master] [jaeger] oauth2-proxy doesn't need to authorize

https://gerrit.wikimedia.org/r/1006996