Page MenuHomePhabricator

PAWS metrics-server certificate issue
Closed, ResolvedPublic

Description

The paws metrics server intermittently fails which causes unexpected problems like an inability to delete a namespace.

You can observe this by running kubectl top pods repeatedly. The logs give:
E0630 19:58:43.262860 1 authentication.go:53] Unable to authenticate the request due to an error: x509: certificate has expired or is not yet valid: current time 2021-06-30T19:58:43Z is after 2021-05-26T18:12:12Z in the metrics server pod. Not sure which x509 is the problem yet.

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2021-06-30T20:05:22Z] <bstorm> tried force delete on the ingress-nginx-gen2 namespace, which doesn't appear to be working either until metrics-server is fixed T285905

The cert that is problematic doesn't appear to be in the control plane:

root@paws-k8s-control-1:~# kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Nov 30, 2021 15:54 UTC   152d                                    no
apiserver                  Nov 30, 2021 15:54 UTC   152d            ca                      no
apiserver-etcd-client      Nov 30, 2021 15:54 UTC   152d            etcd-ca                 no
apiserver-kubelet-client   Nov 30, 2021 15:54 UTC   152d            ca                      no
controller-manager.conf    Nov 30, 2021 15:54 UTC   152d                                    no
etcd-healthcheck-client    Nov 30, 2021 15:54 UTC   152d            etcd-ca                 no
etcd-peer                  Nov 30, 2021 15:54 UTC   152d            etcd-ca                 no
etcd-server                Nov 30, 2021 15:54 UTC   152d            etcd-ca                 no
front-proxy-client         Nov 30, 2021 15:54 UTC   152d            front-proxy-ca          no
scheduler.conf             Nov 30, 2021 15:54 UTC   152d                                    no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      May 24, 2030 18:05 UTC   8y              no
etcd-ca                 May 24, 2030 18:05 UTC   8y              no
front-proxy-ca          May 24, 2030 18:05 UTC   8y              no
Bstorm claimed this task.
Bstorm added a subscriber: Majavah.

This is working fine now. I don't know what changed. @Majavah maybe you fixed it? Maybe it fixed itself somewhere by the cluster rotating something.

I'll close this.

I don't think I did anything, so I'm even more confused now..

I'm happier with mysteriously working than mysteriously not working, I guess.