While trying to access https://tools.wmflabs.org/guc/?user=193.180.154.229 , I got the following nonstandard messages on the page:
Warning: dns_get_record(): A temporary server error occurred. in /data/project/guc/labs-tools-guc/src/IPInfo.php on line 87
Warning: PDO::__construct(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /data/project/guc/labs-tools-guc/src/App.php on line 32
Error: Database error: Unable to connect to s1.web.db.svc.eqiad.wmflabs
TODO (Lessons learned in debugging):
* build an image with reasonable diag tools (dig, ping, traceroute, mtr, ...)
* Run a serviceset that places a diagnostic pod on all worker nodes
* Have an easy command to list all pods on a node (get pods --all-namespaces -o wide|grep tools-worker-1002)
* runbook page for flannel debugging
* Have an easy command to start a new pod on a given node (<https://kubernetes.io/docs/concepts/configuration/assign-pod-node/>)