PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.309 second response time
root@tools-k8s-master-01:~# kubectl get nodes | grep -i not tools-worker-1021.tools.eqiad.wmflabs NotReady 1y tools-worker-1028.tools.eqiad.wmflabs NotReady,SchedulingDisabled 220d tools-worker-1029.tools.eqiad.wmflabs NotReady,SchedulingDisabled 220d
Same issue with alert on notready node and then:
tools-worker-1007:~# tail /var/log/syslog Oct 17 20:47:29 tools-worker-1007 kube-proxy[7153]: E1017 20:47:29.420941 7153 reflector.go:203] pkg/proxy/config/api.go:33: Failed to list *api.Endpoints: the server has asked for the client to provide credentials (get endpoints) Oct 17 20:47:29 tools-worker-1007 kubelet[7241]: E1017 20:47:29.444322 7241 reflector.go:203] pkg/kubelet/kubelet.go:403: Failed to list *api.Node: the server has asked for the client to provide credentials (get nodes) Oct 17 20:47:29 tools-worker-1007 kubelet[7241]: E1017 20:47:29.444390 7241 reflector.go:203] pkg/kubelet/config/apiserver.go:43: Failed to list *api.Pod: the server has asked for the client to provide credentials (get pods)
Puppet fixes this:
tools-worker-1007:~# puppet agent --test Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Info: Caching catalog for tools-worker-1007.tools.eqiad.wmflabs Info: Applying configuration version '1508271953' Notice: /Stage[main]/K8s::Infrastructure_config/File[/etc/kubernetes/kubeconfig]/content: --- /etc/kubernetes/kubeconfig 2017-10-17 20:40:23.567365746 +0000 +++ /tmp/puppet-file20171017-7746-10ohqpn 2017-10-17 20:48:12.585317275 +0000 @@ -14,4 +14,4 @@ users: - name: client-infrastructure user: - token: faketoken + token: <real token inserted here> Info: Computing checksum on file /etc/kubernetes/kubeconfig Info: FileBucket got a duplicate file {md5}97c5a61de4e04330c5cfa123d4408736 Info: /Stage[main]/K8s::Infrastructure_config/File[/etc/kubernetes/kubeconfig]: Filebucketed /etc/kubernetes/kubeconfig to puppet with sum 97c5a61de4e04330c5cfa123d4408736 Notice: /Stage[main]/K8s::Infrastructure_config/File[/etc/kubernetes/kubeconfig]/content: content changed '{md5}97c5a61de4e04330c5cfa123d4408736' to '{md5}8fff205d380602bf440d1e39960a5a8e' Info: /Stage[main]/K8s::Infrastructure_config/File[/etc/kubernetes/kubeconfig]: Scheduling refresh of Service[kubelet] Info: /Stage[main]/K8s::Infrastructure_config/File[/etc/kubernetes/kubeconfig]: Scheduling refresh of Service[kube-proxy] Notice: /Stage[main]/K8s::Proxy/Service[kube-proxy]: Triggered 'refresh' from 1 events Notice: /Stage[main]/K8s::Kubelet/Service[kubelet]: Triggered 'refresh' from 1 events Notice: Finished catalog run in 6.33 seconds