Hi @ezachte!
I am opening this task as subtask of T198623, in which me and @ayounsi are trying to narrow down the last cases of https calls made from stat1005 towards various wikimedia endpoints like lists.wikimedia.org and text-lb.eqiad.wikimedia.org (this is the LVS endpoint for cache text endpoints like en.wikipedia.org etc..).
Background: all the analytics hosts like stat1005 are inside a specific network VLAN, and a firewall controls all the data that flows from the Analytics host to the rest of the Production network (so a bit different from the "usual firewall rules that work in the other way around).
As part of T198623 we are trying to narrow down the last use cases of http/https calls that don't go through the http/https proxies (https://wikitech.wikimedia.org/wiki/HTTP_proxy) to fix them and finally enforce the last firewall rules (that are now not active to avoid disrupting the users' crons etc..).
It is a bit difficult to pin point the source of a https call via tcpdump (encryption, no pid listed in tcpdump, etc..) but eventually I was able to correlate the timings of the https calls on stat1005 with some of your crons. I took the liberty of doing the following:
- Find all occurrences of $ua->proxy in perl files and add a "https" config like this:
use LWP::UserAgent; my $ua = LWP::UserAgent->new(); $ua->proxy(["http","https"], $ENV{"http_proxy"}) ; <====================== $ua->agent("Wikimedia Perl job / EZ"); $ua->timeout(5); $page = $ua->get('https://lists.wikimedia.org/'); die "Error!!", $page->status_line, "\n Aborting" unless $page->is_success; print "Success", $page->content_type, " document!\n";
The above test script seems working fine, so I applied the change to the following (where https was missing):
elukey@stat1005:/srv/home/ezachte$ sudo grep -rni "ua->proxy" * wikistats/progress/WikimediaDownload.pl:757: $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ; wikistats/traffic/perl/CollectCountryInfoFromWikipedia.pl:43: $ua->proxy(["http", "https"], $ENV{"http_proxy"}); wikistats/dumps/perl/WikiCountsScanNamespacesWithContent.pl:184: $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ; wikistats/dumps/perl/WikiCountsFetchPagesCountReferences.pl:87: $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ; wikistats/dammit.lt/perl/DammitScanPages.pl:255: $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ; wikistats/dammit.lt/perl/DammitScanCategories.pl:381:# $ua->proxy(["http"], $ENV{"http_proxy"}) ; wikistats/dammit.lt/perl/DammitScanCategories.pl:382: $ua->proxy(["https"], $ENV{"https_proxy"}) ; wikistats/dammit.lt/perl/DammitReportPageRequests.pl:702: $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ; wikistats/dammit.lt/perl/DammitProjectMedicinCollectTitles.pl:267: $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ; wikistats/mail-lists/perl/CollectMailArchives.pl:79: $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ;
- Find all occurrences of export http_proxy in bash files and add right after them a export https_proxy as well (according to https://wikitech.wikimedia.org/wiki/HTTP_proxy).
I opened this task to carefully explain what's done to you and also to get some suggestions about what could be missing. You can see an example in T198623#4472914 about http calls that are now redirected to https endpoints.
If everything is fine please close the task, otherwise let's follow up in here :)