We had a series of notable 503 spikes today: https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=2&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5&from=1504821102469&to=1504827498790 . In the 5xx logs, they all had x_cache lines implicating cp1066 as backend-most cache.
I've depooled the node from all services at 23:42.
I haven't found any solid lead yet on exactly what is going wrong there. It could be a host problem, or it could be a URL-specific problem that chashed to cp1066 (in which case this will probably recur shortly and implicate a different node).