Page MenuHomePhabricator

Scripts modified to allow a https proxy on stat1005
Closed, ResolvedPublic

Description

Hi @ezachte!

I am opening this task as subtask of T198623, in which me and @ayounsi are trying to narrow down the last cases of https calls made from stat1005 towards various wikimedia endpoints like lists.wikimedia.org and text-lb.eqiad.wikimedia.org (this is the LVS endpoint for cache text endpoints like en.wikipedia.org etc..).

Background: all the analytics hosts like stat1005 are inside a specific network VLAN, and a firewall controls all the data that flows from the Analytics host to the rest of the Production network (so a bit different from the "usual firewall rules that work in the other way around).

As part of T198623 we are trying to narrow down the last use cases of http/https calls that don't go through the http/https proxies (https://wikitech.wikimedia.org/wiki/HTTP_proxy) to fix them and finally enforce the last firewall rules (that are now not active to avoid disrupting the users' crons etc..).

It is a bit difficult to pin point the source of a https call via tcpdump (encryption, no pid listed in tcpdump, etc..) but eventually I was able to correlate the timings of the https calls on stat1005 with some of your crons. I took the liberty of doing the following:

  1. Find all occurrences of $ua->proxy in perl files and add a "https" config like this:
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();

$ua->proxy(["http","https"], $ENV{"http_proxy"}) ;  <======================

$ua->agent("Wikimedia Perl job / EZ");
$ua->timeout(5);
$page = $ua->get('https://lists.wikimedia.org/');
die "Error!!", $page->status_line, "\n Aborting"
unless $page->is_success;
print "Success", $page->content_type, " document!\n";

The above test script seems working fine, so I applied the change to the following (where https was missing):

elukey@stat1005:/srv/home/ezachte$ sudo grep -rni "ua->proxy" *
wikistats/progress/WikimediaDownload.pl:757:  $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ;
wikistats/traffic/perl/CollectCountryInfoFromWikipedia.pl:43:  $ua->proxy(["http", "https"], $ENV{"http_proxy"});
wikistats/dumps/perl/WikiCountsScanNamespacesWithContent.pl:184:  $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ;
wikistats/dumps/perl/WikiCountsFetchPagesCountReferences.pl:87:  $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ;
wikistats/dammit.lt/perl/DammitScanPages.pl:255:  $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ;
wikistats/dammit.lt/perl/DammitScanCategories.pl:381:# $ua->proxy(["http"], $ENV{"http_proxy"}) ;
wikistats/dammit.lt/perl/DammitScanCategories.pl:382:  $ua->proxy(["https"], $ENV{"https_proxy"}) ;
wikistats/dammit.lt/perl/DammitReportPageRequests.pl:702:  $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ;
wikistats/dammit.lt/perl/DammitProjectMedicinCollectTitles.pl:267:  $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ;
wikistats/mail-lists/perl/CollectMailArchives.pl:79:  $ua->proxy(["http", "https"], $ENV{"http_proxy"}) ;
  1. Find all occurrences of export http_proxy in bash files and add right after them a export https_proxy as well (according to https://wikitech.wikimedia.org/wiki/HTTP_proxy).

I opened this task to carefully explain what's done to you and also to get some suggestions about what could be missing. You can see an example in T198623#4472914 about http calls that are now redirected to https endpoints.

If everything is fine please close the task, otherwise let's follow up in here :)

Event Timeline

elukey triaged this task as Medium priority.Aug 3 2018, 7:16 AM
elukey created this task.
elukey renamed this task from Scripts modified to allow a https proxy to Scripts modified to allow a https proxy on stat1005.Aug 3 2018, 7:16 AM
elukey updated the task description. (Show Details)

From my point of view all the https connections are now using the http proxy, didn't see any issue so far. I am going to close this task, please re-open if anything is not looking good.