Since December 24th at roughly 1am, a bunch of CI jobs have been falling (T122449). It seems to be caused by the web proxy webproxy.eqiad.wmnet on port 8080 to no more be reachable from labs instances.
integration-slave-trusty-1011:~$ curl --verbose -4 --proxy webproxy.eqiad.wmnet:8080 https://meta.wikimedia.org/wiki/Main_Page * Hostname was NOT found in DNS cache * Trying 208.80.154.10...
I can ping carbon just fine from labs.
It works fine from a production host such as gallium:
Looking at Icinga for Carbon:
Service | Status | Duration | Message |
---|---|---|---|
Squid | OK | 93d 10h 25m 16s | TCP OK - 0.001 second response time on port 8080 |
Puppet | Warning | 2d 17h 31m 19s | WARNING: Puppet is currently disabled, last run 2 days ago with 0 failures |
Which seems to correlate with the starts of CI issues.
So it seems to me carbon has some live hack (puppet is disabled) that prevents its web proxy to be reachable from labs :-/