Page MenuHomePhabricator

Consider using a dedicated TLS certificate for upload.w.o
Closed, ResolvedPublic

Description

Right now we are currently using the unified certificate on the upload cluster when effectively we only need two SANs (upload.w.o and maps.w.o). I'm guessing this comes from the times where we used commercial CAs in the CDN. This could end "soon" and we probably could benefit from shipping a single cert with both upload.w.o and maps.w.o or two certificates.

Potential benefits:

  • Decrease the exposure of our unified cert
  • Save one RTT per TLS handshake on the upload cluster. (packets needed to perform a TLS 1.3 handshake decreases on a 12.5%: from 16 to 14)

In terms of traffic we would be saving ~700 bytes per TLSv1.3 connection (rough estimate using en.wikipedia.org VS www.wikiworkshop.org ServerHello data):

$ openssl s_client -msg -servername www.wikiworkshop.org -connect text-lb.eqiad.wikimedia.org:443 2>/dev/null </dev/null |grep Certificate
<<< TLS 1.3, Handshake [length 0839], Certificate

$ openssl s_client -msg -servername en.wikipedia.org -connect text-lb.eqiad.wikimedia.org:443 2>/dev/null </dev/null |grep Certificate
<<< TLS 1.3, Handshake [length 0af4], Certificate

some napkin math calculations shows that we would save ~456 GB of data per day assuming ~700 millions of connections per day to the upload cluster

Event Timeline

Vgutierrez triaged this task as Medium priority.May 16 2025, 1:13 PM

Change #1159510 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Issue a separate LE cert for upload cache cluster

https://gerrit.wikimedia.org/r/1159510

Change #1159511 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Issue a separate GTS cert for upload cache cluster

https://gerrit.wikimedia.org/r/1159511

Change #1159510 merged by Vgutierrez:

[operations/puppet@production] hiera: Issue a separate LE cert for upload cache cluster

https://gerrit.wikimedia.org/r/1159510

Change #1159511 merged by Vgutierrez:

[operations/puppet@production] hiera: Issue a separate GTS cert for upload cache cluster

https://gerrit.wikimedia.org/r/1159511

Change #1163749 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::haproxy: Simplify cert configuration

https://gerrit.wikimedia.org/r/1163749

Change #1163749 merged by Vgutierrez:

[operations/puppet@production] cache::haproxy: Simplify cert configuration

https://gerrit.wikimedia.org/r/1163749

Change #1164238 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Use the upload cert on upload@ulsfo

https://gerrit.wikimedia.org/r/1164238

Change #1164449 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] haproxy,varnish: Introduce a host independent healthcheck

https://gerrit.wikimedia.org/r/1164449

Change #1164466 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] service: Target upload.wm.o on upload-https healthchecks

https://gerrit.wikimedia.org/r/1164466

Change #1164449 merged by Vgutierrez:

[operations/puppet@production] haproxy,varnish: Introduce a host independent healthcheck

https://gerrit.wikimedia.org/r/1164449

Change #1164466 merged by Vgutierrez:

[operations/puppet@production] service: Target upload.wm.o on upload-https healthchecks

https://gerrit.wikimedia.org/r/1164466

Mentioned in SAL (#wikimedia-operations) [2025-06-30T14:18:46Z] <vgutierrez@cumin1002> START - Cookbook sre.loadbalancer.admin config_reloading P{lvs4010.ulsfo.wmnet} and A:liberica (T394484)

Mentioned in SAL (#wikimedia-operations) [2025-06-30T14:19:04Z] <vgutierrez@cumin1002> END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs4010.ulsfo.wmnet} and A:liberica (T394484)

Change #1165035 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] cache::haproxy: Fix acl checks for unique path healthcheck

https://gerrit.wikimedia.org/r/1165035

Change #1165035 merged by Vgutierrez:

[operations/puppet@production] cache::haproxy: Fix acl checks for unique path healthcheck

https://gerrit.wikimedia.org/r/1165035

Mentioned in SAL (#wikimedia-operations) [2025-06-30T16:06:56Z] <vgutierrez@cumin1002> START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6002.drmrs.wmnet,lvs5005.eqsin.wmnet,lvs3009.esams.wmnet,lvs7002.magru.wmnet,lvs4009.ulsfo.wmnet} and A:liberica (T394484)

Mentioned in SAL (#wikimedia-operations) [2025-06-30T16:08:29Z] <vgutierrez@cumin1002> END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6002.drmrs.wmnet,lvs5005.eqsin.wmnet,lvs3009.esams.wmnet,lvs7002.magru.wmnet,lvs4009.ulsfo.wmnet} and A:liberica (T394484)

Mentioned in SAL (#wikimedia-operations) [2025-06-30T16:14:10Z] <vgutierrez@cumin1002> START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6003.drmrs.wmnet,lvs5006.eqsin.wmnet,lvs3010.esams.wmnet,lvs7003.magru.wmnet,lvs4010.ulsfo.wmnet} and A:liberica (T394484)

Mentioned in SAL (#wikimedia-operations) [2025-06-30T16:15:41Z] <vgutierrez@cumin1002> END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6003.drmrs.wmnet,lvs5006.eqsin.wmnet,lvs3010.esams.wmnet,lvs7003.magru.wmnet,lvs4010.ulsfo.wmnet} and A:liberica (T394484)

Change #1164238 merged by Vgutierrez:

[operations/puppet@production] hiera: Use the upload cert on upload@ulsfo

https://gerrit.wikimedia.org/r/1164238

Mentioned in SAL (#wikimedia-operations) [2025-07-01T07:54:19Z] <vgutierrez> switching upload@ulsfo to upload TLS certificate - T394484

After the switch on ulsfo we are currently saving one RTT per handshake with and without SNI present on the ClientHello, client side used to generate the requests is openssl s_client with -servername for SNI requests

no-SNI@ulsfo
127 2.861259550 192.168.88.5 → 198.35.26.112 TLSv1 401 Client Hello                                                                                                            
128 3.041987177 198.35.26.112 → 192.168.88.5 TLSv1.3 2483 Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data, Application Data
130 3.048191039 192.168.88.5 → 198.35.26.112 TLSv1.3 130 Change Cipher Spec, Application Data
131 3.228237150 198.35.26.112 → 192.168.88.5 TLSv1.3 337 Application Data 
132 3.228238111 198.35.26.112 → 192.168.88.5 TLSv1.3 337 Application Data 
171 4.609031744 198.35.26.112 → 192.168.88.5 TLSv1.3 90 Application Data
no-SN@eqiad
257 6.750985571 192.168.88.5 → 208.80.154.240 TLSv1 401 Client Hello
262 6.864366834 208.80.154.240 → 192.168.88.5 TLSv1.3 2946 Server Hello, Change Cipher Spec, Application Data
263 6.864367234 208.80.154.240 → 192.168.88.5 TLSv1.3 302 Application Data, Application Data, Application Data
266 6.870703650 192.168.88.5 → 208.80.154.240 TLSv1.3 130 Change Cipher Spec, Application Data
267 6.984013238 208.80.154.240 → 192.168.88.5 TLSv1.3 337 Application Data
268 6.984013628 208.80.154.240 → 192.168.88.5 TLSv1.3 337 Application Data
302 8.507013662 208.80.154.240 → 192.168.88.5 TLSv1.3 90 Application Data
SNI@ulsfo
205 6.863857449 192.168.88.5 → 198.35.26.112 TLSv1 392 Client Hello
206 7.039381837 198.35.26.112 → 192.168.88.5 TLSv1.3 2483 Server Hello, Change Cipher Spec, Application Data, Application Data, Application Data, Application Data
208 7.045302722 192.168.88.5 → 198.35.26.112 TLSv1.3 130 Change Cipher Spec, Application Data
210 7.220744942 198.35.26.112 → 192.168.88.5 TLSv1.3 321 Application Data
211 7.220745783 198.35.26.112 → 192.168.88.5 TLSv1.3 321 Application Data
440 14.710004444 198.35.26.112 → 192.168.88.5 TLSv1.3 90 Application Data
SNI@eqiad
 106 1.369955518 192.168.88.5 → 208.80.154.240 TLSv1 392 Client Hello
 111 1.478483510 208.80.154.240 → 192.168.88.5 TLSv1.3 1506 Server Hello, Change Cipher Spec, Application Data
 114 1.478747559 208.80.154.240 → 192.168.88.5 TLSv1.3 302 Application Data, Application Data, Application Data
 117 1.484832673 192.168.88.5 → 208.80.154.240 TLSv1.3 130 Change Cipher Spec, Application Data
 122 1.592609255 208.80.154.240 → 192.168.88.5 TLSv1.3 321 Application Data
 123 1.592609856 208.80.154.240 → 192.168.88.5 TLSv1.3 321 Application Data
1682 50.660026009 208.80.154.240 → 192.168.88.5 TLSv1.3 90 Application Data

Change #1165842 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Switch to upload cert on upload cluster

https://gerrit.wikimedia.org/r/1165842

Change #1165888 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Switch eqsin to the new upload cert

https://gerrit.wikimedia.org/r/1165888

Change #1165888 merged by Vgutierrez:

[operations/puppet@production] hiera: Switch eqsin to the new upload cert

https://gerrit.wikimedia.org/r/1165888

Mentioned in SAL (#wikimedia-operations) [2025-07-02T13:37:40Z] <vgutierrez> switch upload@eqsin to the new upload cert - T394484

@CDanis I just realized that measure-$site.wikimedia.org is using the upload cluster:

wikimedia.org/operations/dns$ git grep measure-
templates/wikimedia.org:measure-eqiad       1D IN CNAME upload-lb.eqiad.wikimedia.org.
templates/wikimedia.org:measure-codfw       1D IN CNAME upload-lb.codfw.wikimedia.org.
templates/wikimedia.org:measure-esams       1D IN CNAME upload-lb.esams.wikimedia.org.
templates/wikimedia.org:measure-ulsfo       1D IN CNAME upload-lb.ulsfo.wikimedia.org.
templates/wikimedia.org:measure-eqsin       1D IN CNAME upload-lb.eqsin.wikimedia.org.
templates/wikimedia.org:measure-drmrs       1D IN CNAME upload-lb.drmrs.wikimedia.org.
templates/wikimedia.org:measure-magru       1D IN CNAME upload-lb.magru.wikimedia.org.

We already deployed the upload.w.o cert in ulsfo and eqsin so I'm assuming we should have some issues/errors in probenet data regarding those 2 sites

Change #1167143 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Issue dedicated certs for probenet endpoints

https://gerrit.wikimedia.org/r/1167143

Change #1167143 merged by Vgutierrez:

[operations/puppet@production] hiera: Issue dedicated certs for probenet endpoints

https://gerrit.wikimedia.org/r/1167143

Change #1167616 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] hiera: Deploy and enable measure cert on upload cluster

https://gerrit.wikimedia.org/r/1167616

Change #1167616 merged by Vgutierrez:

[operations/puppet@production] hiera: Deploy and enable measure cert on upload cluster

https://gerrit.wikimedia.org/r/1167616

Mentioned in SAL (#wikimedia-operations) [2025-07-09T13:46:34Z] <vgutierrez> deploy measure/measure-goog certs in the upload CDN cluster - T394484

Change #1165842 merged by Vgutierrez:

[operations/puppet@production] hiera: Switch to upload cert on upload cluster

https://gerrit.wikimedia.org/r/1165842

Mentioned in SAL (#wikimedia-operations) [2025-07-10T07:50:59Z] <vgutierrez> switching to upload cert globally on upload CDN cluster - T394484

Vgutierrez claimed this task.

the whole upload cluster is now using a dedicated upload cert