Page MenuHomePhabricator

Webproxy on carbon unreachable from labs instances since Dec 24 roughly 1am
Closed, InvalidPublic

Description

Since December 24th at roughly 1am, a bunch of CI jobs have been falling (T122449). It seems to be caused by the web proxy webproxy.eqiad.wmnet on port 8080 to no more be reachable from labs instances.

integration-slave-trusty-1011:~$ curl --verbose -4 --proxy webproxy.eqiad.wmnet:8080 https://meta.wikimedia.org/wiki/Main_Page
* Hostname was NOT found in DNS cache
*   Trying 208.80.154.10...

I can ping carbon just fine from labs.

It works fine from a production host such as gallium:

Looking at Icinga for Carbon:

Service Status DurationMessage
SquidOK93d 10h 25m 16sTCP OK - 0.001 second response time on port 8080
PuppetWarning2d 17h 31m 19sWARNING: Puppet is currently disabled, last run 2 days ago with 0 failures

Which seems to correlate with the starts of CI issues.

So it seems to me carbon has some live hack (puppet is disabled) that prevents its web proxy to be reachable from labs :-/

Event Timeline

hashar raised the priority of this task from to High.
hashar updated the task description. (Show Details)
hashar added subscribers: Paladox, gerritbot, hashar and 2 others.
faidon claimed this task.
faidon added a subscriber: faidon.

See T122368. Why do you need to use the webproxy? Labs instances have Internet connectivity via NAT so the webproxy shouldn't be needed.

We have pointed the MediaWiki configuration on CI to a proxy because we had some hosts that had no direct access to internet (prod slaves in 10.0.0.0/8). It is no more the case nowadays though so I will just get rid of the proxy.

maven was still being routed via webproxy: T122594