The k8s tools checker went critical with
HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string 'OK' not found on 'http://checker.tools.wmflabs.org:80/k8s/nodes/ready'
Looks like this is related to a bad puppet agent run that stripped the tools domain from the fqdn.
Jul 7 03:42:46 tools-worker-1022 puppet-agent[17280]: (/Stage[main]/K8s::Kubelet/File[/etc/default/kubelet]/content) -KUBELET_HOSTNAME="--hostname-override=tools-worker-1022.tools.eqiad.wmflabs" Jul 7 03:42:46 tools-worker-1022 puppet-agent[17280]: (/Stage[main]/K8s::Kubelet/File[/etc/default/kubelet]/content) +KUBELET_HOSTNAME="--hostname-override=tools-worker-1022.eqiad.wmflabs" ... Jul 7 04:11:43 tools-worker-1022 puppet-agent[9226]: (/Stage[main]/K8s::Kubelet/File[/etc/default/kubelet]/content) -KUBELET_HOSTNAME="--hostname-override=tools-worker-1022.eqiad.wmflabs" Jul 7 04:11:43 tools-worker-1022 puppet-agent[9226]: (/Stage[main]/K8s::Kubelet/File[/etc/default/kubelet]/content) +KUBELET_HOSTNAME="--hostname-override=tools-worker-1022.tools.eqiad.wmflabs"
$ kubectl get nodes | grep tools-worker-1022 tools-worker-1022.eqiad.wmflabs NotReady 1h tools-worker-1022.tools.eqiad.wmflabs Ready 2y
The bad hostname is marked as NotReady with the reason.
kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy
I've acked the alert and I'm leaving this bad host here for further investigation