Page MenuHomePhabricator

Use cert-manager for service-proxy certificate creation
Closed, ResolvedPublic

Description

We currently issue TLS certificate for the service-proxy (envoy) manually via cergen (https://wikitech.wikimedia.org/wiki/Kubernetes/Enabling_TLS#Create_and_place_certificates).

When cert-manager (T294560) is deployment to all clusters , we should move the service-proxy certificate creation away from cergen and into the helm charts. We should ensure that we issue those certificates for cluster internal as well as the "usual" external names.

When that is done, the validation of SANs in istio-ingressgateway should be limited to the internal FQDN of a service (https://gerrit.wikimedia.org/g/operations/deployment-charts/+/refs/changes/74/732374/15/common_templates/0.4/_ingress_helpers.tpl#180)

todos:

To update a chart:
Check cergen config for additional alt names in:

/srv/private/modules/secret/secrets/certificates/certificate.manifests.d/kube_services.certs.yaml

Note: The following domain names can be omitted as they are included by default:

  • discovery.wmnet,
  • svc.codfw.wmnet
  • svc.eqiad.wmnet

Update the chart and deployment config

CHART=apertium
sextant update charts/${CHART}/ mesh.deployment
git add charts/${CHART}/
# If additional alt names are required, they must be added as a list under mesh.certmanager.extraFQDNs
vim helmfile.d/services/${CHART}/values.yaml
git add helmfile.d/services/${CHART}/values.yaml
git commit -m "Update ${CHART} to use certmanager certs" -m  "This updates the chart to use a certmanager instead of cergen certificates" -m "Bug: T300033"

Details

SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+8 -47
operations/deployment-chartsmaster+28 -68
operations/deployment-chartsmaster+229 -0
labs/privatemaster+0 -40
operations/puppetproduction+1 -0
operations/deployment-chartsmaster+161 -88
operations/deployment-chartsmaster+50 -108
operations/deployment-chartsmaster+753 -0
operations/deployment-chartsmaster+237 -104
operations/deployment-chartsmaster+85 -2
operations/deployment-chartsmaster+5 -3
operations/deployment-chartsmaster+85 -2
operations/deployment-chartsmaster+899 -15
operations/deployment-chartsmaster+7 -8
operations/docker-images/production-imagesmaster+9 -3
operations/docker-images/production-imagesmaster+8 -1
operations/deployment-chartsmaster+733 -1 K
operations/deployment-chartsmaster+336 -132
operations/deployment-chartsmaster+336 -132
operations/deployment-chartsmaster+342 -132
operations/deployment-chartsmaster+339 -133
operations/deployment-chartsmaster+336 -132
operations/deployment-chartsmaster+341 -77
operations/deployment-chartsmaster+360 -153
operations/deployment-chartsmaster+337 -137
operations/deployment-chartsmaster+336 -132
operations/deployment-chartsmaster+5 -1
operations/deployment-chartsmaster+336 -132
operations/deployment-chartsmaster+336 -132
operations/deployment-chartsmaster+337 -133
operations/deployment-chartsmaster+336 -132
operations/deployment-chartsmaster+2 -13
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+77 -1
operations/deployment-chartsmaster+79 -70
operations/deployment-chartsmaster+20 -2
operations/deployment-chartsmaster+72 -4
operations/deployment-chartsmaster+333 -133
operations/deployment-chartsmaster+334 -134
operations/deployment-chartsmaster+319 -119
operations/deployment-chartsmaster+334 -134
operations/deployment-chartsmaster+244 -64
operations/deployment-chartsmaster+332 -132
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+24 -0
operations/deployment-chartsmaster+335 -135
operations/deployment-chartsmaster+257 -108
operations/deployment-chartsmaster+329 -126
operations/deployment-chartsmaster+240 -106
operations/deployment-chartsmaster+3 -0
operations/deployment-chartsmaster+249 -109
operations/deployment-chartsmaster+235 -105
operations/deployment-chartsmaster+234 -104
operations/deployment-chartsmaster+238 -105
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+237 -107
operations/puppetproduction+3 -0
operations/deployment-chartsmaster+112 -23
operations/deployment-chartsmaster+23 -7
operations/puppetproduction+12 -2
operations/deployment-chartsmaster+236 -103
operations/puppetproduction+48 -39
operations/deployment-chartsmaster+22 -0
operations/deployment-chartsmaster+155 -14
operations/puppetproduction+1 -1
operations/puppetproduction+49 -0
operations/deployment-chartsmaster+609 -0
operations/docker-images/production-imagesmaster+27 -1
operations/deployment-chartsmaster+5 -2
operations/deployment-chartsmaster+12 -0
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 967864 merged by jenkins-bot:

[operations/deployment-charts@master] mw-debug: Switch to certmanager certificates

https://gerrit.wikimedia.org/r/967864

Mentioned in SAL (#wikimedia-operations) [2023-10-23T10:41:12Z] <jayme> switched mw-debug (mw-on-k8s) to certmanager certificates - T300033

Change 967865 merged by jenkins-bot:

[operations/deployment-charts@master] mw-web: Switch to certmanager certificates

https://gerrit.wikimedia.org/r/967865

Mentioned in SAL (#wikimedia-operations) [2023-10-23T12:40:38Z] <jayme> switched mw-web (mw-on-k8s) to certmanager certificates - T300033

Change 967866 merged by jenkins-bot:

[operations/deployment-charts@master] mw-jobrunner: Switch to certmanager certificates

https://gerrit.wikimedia.org/r/967866

Mentioned in SAL (#wikimedia-operations) [2023-10-23T14:05:58Z] <jayme> switched mw-jobrunner (mw-on-k8s) to certmanager certificates - T300033

Change 967867 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-ext: Switch to certmanager certificates

https://gerrit.wikimedia.org/r/967867

Mentioned in SAL (#wikimedia-operations) [2023-10-23T14:13:58Z] <jayme> switched mw-api-ext (mw-on-k8s) to certmanager certificates - T300033

Change 967868 merged by jenkins-bot:

[operations/deployment-charts@master] mw-api-int: Switch to certmanager certificates

https://gerrit.wikimedia.org/r/967868

Mentioned in SAL (#wikimedia-operations) [2023-10-23T14:26:25Z] <jayme> switched mw-api-int (mw-on-k8s) to certmanager certificates - T300033

Change 967940 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] mw-on-k8s: Globally enable certmanager certs

https://gerrit.wikimedia.org/r/967940

Change 967940 merged by jenkins-bot:

[operations/deployment-charts@master] mw-on-k8s: Globally enable certmanager certs

https://gerrit.wikimedia.org/r/967940

Change 967406 merged by jenkins-bot:

[operations/deployment-charts@master] Update recommendation-api to use certmanager certs

https://gerrit.wikimedia.org/r/967406

Change 967410 merged by jenkins-bot:

[operations/deployment-charts@master] Update shellbox to use certmanager certs

https://gerrit.wikimedia.org/r/967410

Change 967403 merged by jenkins-bot:

[operations/deployment-charts@master] Update calculator-service to use certmanager certs

https://gerrit.wikimedia.org/r/967403

Change 967473 merged by jenkins-bot:

[operations/deployment-charts@master] Update similar-users to use certmanager certs

https://gerrit.wikimedia.org/r/967473

Change 969141 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] tegola-vector-tiles: Re-enable the envoy admin listener on tcp port

https://gerrit.wikimedia.org/r/969141

Change 969141 merged by jenkins-bot:

[operations/deployment-charts@master] tegola-vector-tiles: Re-enable the envoy admin listener on tcp port

https://gerrit.wikimedia.org/r/969141

Change 967405 merged by jenkins-bot:

[operations/deployment-charts@master] Update mobileapps to use certmanager certs

https://gerrit.wikimedia.org/r/967405

JMeybohm updated the task description. (Show Details)

Change 958479 merged by jenkins-bot:

[operations/deployment-charts@master] Update developer-portal to use certmanager certs

https://gerrit.wikimedia.org/r/958479

Change 969343 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update flink-session-cluster to use certmanager certs

https://gerrit.wikimedia.org/r/969343

Change 969345 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update datahub to use certmanager certs

https://gerrit.wikimedia.org/r/969345

Change 969366 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update benthos to use certmanager certs

https://gerrit.wikimedia.org/r/969366

Change 969343 merged by Bking:

[operations/deployment-charts@master] Update flink-session-cluster to use certmanager certs

https://gerrit.wikimedia.org/r/969343

Change 969366 merged by jenkins-bot:

[operations/deployment-charts@master] Update benthos to use certmanager certs

https://gerrit.wikimedia.org/r/969366

Change 967412 merged by jenkins-bot:

[operations/deployment-charts@master] Update termbox to use certmanager certs

https://gerrit.wikimedia.org/r/967412

Change 959181 merged by jenkins-bot:

[operations/deployment-charts@master] eventgate: Update mesh module

https://gerrit.wikimedia.org/r/959181

Mentioned in SAL (#wikimedia-operations) [2023-11-06T16:41:11Z] <ottomata> beginning deployments of eventgate clusters: mesh and cert chart updates, as well as sleep timeout values for graceful envoy+eventgate container termination - T349823 T300033 T346638

Change 967402 merged by jenkins-bot:

[operations/deployment-charts@master] Update eventstreams to use certmanager certs

https://gerrit.wikimedia.org/r/967402

Change 967414 merged by jenkins-bot:

[operations/deployment-charts@master] Update wikifeeds to use certmanager certs

https://gerrit.wikimedia.org/r/967414

wikifeeds deployment is blocked by a config change (https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/961696) that was merged but not deployed to production. @Jgiannelos is it safe to deploy to prod?

Change 967415 merged by jenkins-bot:

[operations/deployment-charts@master] Update zotero to use certmanager certs

https://gerrit.wikimedia.org/r/967415

Change 969345 merged by jenkins-bot:

[operations/deployment-charts@master] Update datahub to use certmanager certs

https://gerrit.wikimedia.org/r/969345

Change 972404 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Update api-gateway to use certmanager certs

https://gerrit.wikimedia.org/r/972404

Change 972844 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] api-gateway,rest-gateway: Switch to cert-manager certificates

https://gerrit.wikimedia.org/r/972844

Change 973721 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/docker-images/production-images@master] envoy: Allow additional arguments to envoy

https://gerrit.wikimedia.org/r/973721

JMeybohm changed the task status from Open to Stalled.Nov 13 2023, 9:05 AM
JMeybohm updated the task description. (Show Details)

Change 973721 merged by JMeybohm:

[operations/docker-images/production-images@master] envoy: Allow additional arguments to envoy

https://gerrit.wikimedia.org/r/973721

Change 974662 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/docker-images/production-images@master] envoy: use ENTRYPOINT instead of CMD

https://gerrit.wikimedia.org/r/974662

Change 974662 merged by Hnowlan:

[operations/docker-images/production-images@master] envoy: use ENTRYPOINT instead of CMD

https://gerrit.wikimedia.org/r/974662

Change 976757 had a related patch set uploaded (by Hnowlan; author: Hnowlan):

[operations/deployment-charts@master] api-gateway: use enovy.yaml in place of config.yaml

https://gerrit.wikimedia.org/r/976757

Change 976757 merged by jenkins-bot:

[operations/deployment-charts@master] api-gateway: use enovy.yaml in place of config.yaml

https://gerrit.wikimedia.org/r/976757

JMeybohm changed the task status from Stalled to In Progress.Nov 23 2023, 10:12 AM
JMeybohm updated the task description. (Show Details)

Change 972404 merged by jenkins-bot:

[operations/deployment-charts@master] Update api-gateway for cert-manager support

https://gerrit.wikimedia.org/r/972404

Change 972844 merged by jenkins-bot:

[operations/deployment-charts@master] api-gateway,rest-gateway: Switch to cert-manager certificates

https://gerrit.wikimedia.org/r/972844

Change 977195 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Revert "Revert "api-gateway,rest-gateway: Switch to cert-manager certificates""

https://gerrit.wikimedia.org/r/977195

Change 977207 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] api-gateway: Add env variables required for envoy SDS

https://gerrit.wikimedia.org/r/977207

Change 977207 merged by jenkins-bot:

[operations/deployment-charts@master] api-gateway: Add env variables required for envoy SDS

https://gerrit.wikimedia.org/r/977207

Change 977195 merged by jenkins-bot:

[operations/deployment-charts@master] Revert "Revert "api-gateway,rest-gateway: Switch to cert-manager certificates""

https://gerrit.wikimedia.org/r/977195

Change 979094 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Add new mesh module versions: certificate, configuration, deployment

https://gerrit.wikimedia.org/r/979094

Change 979095 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Remove cergen certificate support from mesh module

https://gerrit.wikimedia.org/r/979095

Change 980425 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] function-orchestrator: Update to latest mesh and ingress module

https://gerrit.wikimedia.org/r/980425

Change 947802 abandoned by Effie Mouzeli:

[operations/deployment-charts@master] Update changeprop to use certmanager certs

Reason:

marged in I9a6ba71d0894ec49e4ede1d3cac06c5a4bd3b65f

https://gerrit.wikimedia.org/r/947802

Change 979094 merged by jenkins-bot:

[operations/deployment-charts@master] Add new mesh module versions

https://gerrit.wikimedia.org/r/979094

Change 979095 merged by jenkins-bot:

[operations/deployment-charts@master] Remove cergen certificate support from mesh module

https://gerrit.wikimedia.org/r/979095

Change 980425 merged by jenkins-bot:

[operations/deployment-charts@master] function-orchestrator: Update to latest mesh and ingress module

https://gerrit.wikimedia.org/r/980425

Change 980891 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[labs/private@master] kubernetes: Remove cergen certs from kubernetes secrets

https://gerrit.wikimedia.org/r/980891

Change 981325 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] ml-staging: Enable certmanager for mesh certs by default

https://gerrit.wikimedia.org/r/981325

Change 981325 merged by Elukey:

[operations/puppet@production] ml-staging: Enable certmanager for mesh certs by default

https://gerrit.wikimedia.org/r/981325

Change 981332 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] Add new istio module version

https://gerrit.wikimedia.org/r/981332

Change 981333 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] ingress.istio: Remove trust for every SAN but the default

https://gerrit.wikimedia.org/r/981333

Change 981336 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] function-orchestrator: Update to ingress.istio:1.1

https://gerrit.wikimedia.org/r/981336

Change 980891 merged by JMeybohm:

[labs/private@master] kubernetes: Remove cergen certs from kubernetes secrets

https://gerrit.wikimedia.org/r/980891

Mentioned in SAL (#wikimedia-operations) [2023-12-11T10:03:12Z] <jayme> removed cergen certs of all k8s servies from private puppet in commit d36a97aa23e21824f95d22264d06e2c3bf3c6ac3 - T300033

Change 981332 merged by jenkins-bot:

[operations/deployment-charts@master] Add new istio module version

https://gerrit.wikimedia.org/r/981332

Change 981333 merged by jenkins-bot:

[operations/deployment-charts@master] ingress.istio: Remove trust for every SAN but the default

https://gerrit.wikimedia.org/r/981333

Change 981336 merged by jenkins-bot:

[operations/deployment-charts@master] function-orchestrator: Update to ingress.istio:1.1

https://gerrit.wikimedia.org/r/981336

JMeybohm updated the task description. (Show Details)