Today we got a page by icinga:
```
PROBLEM - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.140 second response time
```
I checked at `tools-k8s-master-01.eqiad.wmflabs` and some worker nodes had issues:
```lines=5
aborrero@tools-k8s-master-01:~$ sudo kubectl get nodes -o wide
NAME STATUS AGE
tools-worker-1001.tools.eqiad.wmflabs Ready 2y
tools-worker-1002.tools.eqiad.wmflabs Ready 2y
tools-worker-1003.tools.eqiad.wmflabs Ready 2y
tools-worker-1004.tools.eqiad.wmflabs Ready 2y
tools-worker-1005.tools.eqiad.wmflabs Ready 2y
tools-worker-1006.tools.eqiad.wmflabs NotReady 2y
tools-worker-1007.tools.eqiad.wmflabs NotReady 2y
tools-worker-1008.tools.eqiad.wmflabs Ready 2y
tools-worker-1009.tools.eqiad.wmflabs Ready 2y
tools-worker-1010.tools.eqiad.wmflabs Ready,SchedulingDisabled 1y
tools-worker-1011.tools.eqiad.wmflabs Ready,SchedulingDisabled 1y
tools-worker-1012.tools.eqiad.wmflabs Ready,SchedulingDisabled 1y
tools-worker-1013.tools.eqiad.wmflabs Ready 1y
tools-worker-1014.tools.eqiad.wmflabs Ready 1y
tools-worker-1015.tools.eqiad.wmflabs Ready 1y
tools-worker-1016.tools.eqiad.wmflabs Ready 1y
tools-worker-1017.tools.eqiad.wmflabs Ready 1y
tools-worker-1018.tools.eqiad.wmflabs Ready 1y
tools-worker-1019.tools.eqiad.wmflabs Ready 1y
tools-worker-1020.tools.eqiad.wmflabs Ready 1y
tools-worker-1021.tools.eqiad.wmflabs NotReady 1y
tools-worker-1022.tools.eqiad.wmflabs Ready 1y
tools-worker-1023.tools.eqiad.wmflabs Ready 1y
tools-worker-1025.tools.eqiad.wmflabs Ready 1y
tools-worker-1026.tools.eqiad.wmflabs Ready 1y
tools-worker-1027.tools.eqiad.wmflabs Ready 1y
tools-worker-1028.tools.eqiad.wmflabs Ready,SchedulingDisabled 1y
tools-worker-1029.tools.eqiad.wmflabs NotReady,SchedulingDisabled 1y
```
After several minutes, all was back to normal state:
```lines=5
aborrero@tools-k8s-master-01:~$ sudo kubectl get nodes -o wide
NAME STATUS AGE
tools-worker-1001.tools.eqiad.wmflabs Ready 2y
tools-worker-1002.tools.eqiad.wmflabs Ready 2y
tools-worker-1003.tools.eqiad.wmflabs Ready 2y
tools-worker-1004.tools.eqiad.wmflabs Ready 2y
tools-worker-1005.tools.eqiad.wmflabs Ready 2y
tools-worker-1006.tools.eqiad.wmflabs Ready 2y
tools-worker-1007.tools.eqiad.wmflabs Ready 2y
tools-worker-1008.tools.eqiad.wmflabs Ready 2y
tools-worker-1009.tools.eqiad.wmflabs Ready 2y
tools-worker-1010.tools.eqiad.wmflabs Ready,SchedulingDisabled 1y
tools-worker-1011.tools.eqiad.wmflabs Ready,SchedulingDisabled 1y
tools-worker-1012.tools.eqiad.wmflabs Ready,SchedulingDisabled 1y
tools-worker-1013.tools.eqiad.wmflabs Ready 1y
tools-worker-1014.tools.eqiad.wmflabs Ready 1y
tools-worker-1015.tools.eqiad.wmflabs Ready 1y
tools-worker-1016.tools.eqiad.wmflabs Ready 1y
tools-worker-1017.tools.eqiad.wmflabs Ready 1y
tools-worker-1018.tools.eqiad.wmflabs Ready 1y
tools-worker-1019.tools.eqiad.wmflabs Ready 1y
tools-worker-1020.tools.eqiad.wmflabs Ready 1y
tools-worker-1021.tools.eqiad.wmflabs Ready 1y
tools-worker-1022.tools.eqiad.wmflabs Ready 1y
tools-worker-1023.tools.eqiad.wmflabs Ready 1y
tools-worker-1025.tools.eqiad.wmflabs Ready 1y
tools-worker-1026.tools.eqiad.wmflabs Ready 1y
tools-worker-1027.tools.eqiad.wmflabs Ready 1y
tools-worker-1028.tools.eqiad.wmflabs Ready,SchedulingDisabled 1y
tools-worker-1029.tools.eqiad.wmflabs NotReady,SchedulingDisabled 1y
```
Not sure what's the matter with these disabled nodes, and I can't jump to `tools-worker-1029.tools.eqiad.wmflabs`.
When I was testing things, I also restarted the checker service:
```
aborrero@tools-checker-01:~$ sudo service toolschecker_kubernetes_nodes_ready restart
toolschecker_kubernetes_nodes_ready stop/waiting
toolschecker_kubernetes_nodes_ready start/running, process 23148
```
But something happened somewhere (proxy?) that now I can't access the checker:
```
arturo@endurance:~$ LANG=C wget https://checker.tools.wmflabs.org/k8s/nodes/ready
--2018-06-12 11:10:49-- https://checker.tools.wmflabs.org/k8s/nodes/ready
Resolving checker.tools.wmflabs.org (checker.tools.wmflabs.org)... 208.80.155.229
Connecting to checker.tools.wmflabs.org (checker.tools.wmflabs.org)|208.80.155.229|:443... failed: Connection refused.
aborrero@tools-clushmaster-01:~$ wget https://checker.tools.wmflabs.org/k8s/nodes/ready
--2018-06-12 09:12:19-- https://checker.tools.wmflabs.org/k8s/nodes/ready
Resolving checker.tools.wmflabs.org (checker.tools.wmflabs.org)... 10.68.16.228
Connecting to checker.tools.wmflabs.org (checker.tools.wmflabs.org)|10.68.16.228|:443... failed: Connection refused.
```
Not sure how is possible icinga is seeing this as OK.
{F22139267}