Switch wikikube-staging (codfw and eqiad) etcd clusters to use PKI
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	JMeybohm
	Feb 15 2023, 10:23 AM

Description

In the light of T329556: K8s etcd on bullseye show TLS errors in logs we should configure the wikikube-staging etcd clusters to use PKI instead of cergen certs

https://gerrit.wikimedia.org/r/c/operations/puppet/+/889082/

staging codfw
staging eqiad
clean up cergen certs in private puppet

Details

	Subject	Repo	Branch	Lines +/-
	secrets/ssl: Remove keys for kubernetes etcd clusters	labs/private	master	+0 -0
	role::etcd::v3::kubernetes::staging: move certs to PKI	operations/puppet	production	+1 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	JMeybohm	T307943 Update Kubernetes clusters to v1.23
Resolved	elukey	T329556 K8s etcd on bullseye show TLS errors in logs
Resolved	JMeybohm	T329717 Switch wikikube-staging (codfw and eqiad) etcd clusters to use PKI

Event Timeline

JMeybohm created this task.Feb 15 2023, 10:23 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 15 2023, 10:23 AM

JMeybohm moved this task from Incoming 🐫 to Doing 😎 on the serviceops board.Feb 15 2023, 10:25 AM

JMeybohm updated the task description. (Show Details)

I have deleted some logs on kubestagetcd100[5,6] since the root partition was almost full, etcd keeps logging TLS errors (error "remote error: tls: bad certificate", ServerName "k8s3-staging.eqiad.wmnet").

Mentioned in SAL (#wikimedia-operations) [2023-02-24T07:52:18Z] <elukey> rm /var/log/{syslog,messages,user.log}.1 on kubetcd1006 to free up space - T329717

Change 891749 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::etcd::v3::kubernetes::staging: move certs to PKI

https://gerrit.wikimedia.org/r/891749

gerritbot added a project: Patch-For-Review.Feb 24 2023, 7:56 AM

Change 891749 merged by Elukey:

[operations/puppet@production] role::etcd::v3::kubernetes::staging: move certs to PKI

https://gerrit.wikimedia.org/r/891749

Mentioned in SAL (#wikimedia-operations) [2023-02-24T09:08:44Z] <elukey> rm /var/log/{syslog,messages,user.log}.1 on kubetcd1005 to free up space - T329717

elukey@kubestagetcd1004:~$ etcdctl -C https://$(hostname -f):2379 cluster-health
member 2e98c8b51153156c is healthy: got healthy result from https://kubestagetcd1006.eqiad.wmnet:2379
member a29a9e00247eef21 is healthy: got healthy result from https://kubestagetcd1005.eqiad.wmnet:2379
member c450621e7916ca97 is healthy: got healthy result from https://kubestagetcd1004.eqiad.wmnet:2379
cluster is healthy

elukey@kubestagetcd2001:~$ etcdctl -C https://$(hostname -f):2379 cluster-health
member 98ab3a19cacdf63b is healthy: got healthy result from https://kubestagetcd2002.codfw.wmnet:2379
member cec6617bc5da0995 is healthy: got healthy result from https://kubestagetcd2003.codfw.wmnet:2379
member d29bf1642e768eed is healthy: got healthy result from https://kubestagetcd2001.codfw.wmnet:2379
cluster is healthy

elukey@kubestagetcd1004:~$ echo y | openssl s_client -connect $(hostname -f):2380 | openssl x509 -text | grep "Subject Alternative Name" -A 1
[..]
            X509v3 Subject Alternative Name: 
                DNS:kubestagetcd1004.eqiad.wmnet, DNS:k8s3-staging.eqiad.wmnet, DNS:_etcd-server-ssl._tcp.k8s3-staging.eqiad.wmnet

elukey@kubestagetcd2001:~$  echo y | openssl s_client -connect $(hostname -f):2380 | openssl x509 -text | grep "Subject Alternative Name" -A 1
[..]
            X509v3 Subject Alternative Name: 
              DNS:kubestagetcd2001.codfw.wmnet, DNS:k8s3-staging.codfw.wmnet, DNS:_etcd-server-ssl._tcp.k8s3-staging.codfw.wmnet

The clusters are up and healthy, verified that the new SAN has been added.

elukey updated the task description. (Show Details)Feb 24 2023, 9:10 AM

Maintenance_bot removed a project: Patch-For-Review.Feb 24 2023, 9:10 AM

JMeybohm awarded a token.Feb 28 2023, 10:32 AM

Change 895237 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[labs/private@master] secrets/ssl: Remove keys for kubernetes etcd clusters

https://gerrit.wikimedia.org/r/895237

gerritbot added a project: Patch-For-Review.Mar 7 2023, 2:48 PM

Change 895237 merged by JMeybohm:

[labs/private@master] secrets/ssl: Remove keys for kubernetes etcd clusters

https://gerrit.wikimedia.org/r/895237

JMeybohm mentioned this in rLPRIa021effa607d: secrets/ssl: Remove keys for kubernetes etcd clusters.Mar 8 2023, 8:01 AM