Page MenuHomePhabricator

Add allowlist to make poking holes in abuse_networks:blocked_nets:networks easier
Open, Needs TriagePublicFeature

Description

As a Beta Cluster maintainer
I want to allow networks used by trusted contributors to pass through large range blocks established to reduce damage from aggressive crawlers
So that the folks who depend on Beta to validate correct technical implementations can perform their work.

<dancy> This stuff would be a lot easier if there were an allowlist of IPs that was processed before abuse_networks.

As discussed in tasks like T392003: High load on deployment-mediawiki14 and slow responses and on the wikitech-l mailing list, the edge CDN for the Beta Cluster wikis has been making use of abuse_networks:blocked_nets:networks to block larger and larger IPv4 address blocks while fighting aggressive crawlers. This works in that it can turn back unwanted traffic without too much work. It is a failure, or at least an unwanted obstacle, when these wide blocks catch traffic from community members who are trying to use Beta to get various forms of work done. We currently have folks doing CIDR math to split larger blocks to leave open holes for smaller networks that known trusted contributors are using. @dancy wisely noted that it would be a bit simpler if we could add the blocks we want to trust to an explicit allow list that would exempt them from the larger blocks.

Event Timeline

This bit in operations/puppet.git:modules/varnish/templates/wikimedia-frontend.vcl.erb is where the block is applied:

// Block requests from IPs in blocked_nets. It is important to do this
// early but after recv_fe_ip_processing has been called, as the procedure
// takes care of writing X-Client-IP if it the request did not come
// through the TLS terminator
if (std.ip(req.http.X-Client-IP, "192.0.2.1") ~ blocked_nets) {
    return (synth(403, "Requests from your IP have been blocked, please contact noc@wikimedia.org"));
}

The blocked_nets acl comes from an import at line 34 in the same file:

// provides the acl "blocked_nets", "text_abuse_nets", "bot_blocked_nets",
// and "public_cloud_nets".
// The file is made from the abuse_networks block in the private repo
// hieradata/common.yaml, but is actually generated by confd sourcing
// data from etcd to ensure faster response times.
// Must be included before analytics.inc.vcl.
include "blocked-nets.inc.vcl";

That file is generated by logic in the profile::cache::varnish::frontend Puppet manifest. Things fork based on the profile::cache::varnish::frontend::use_etcd_req_filters feature flag in hiera. hieradata/cloud.yaml sets that flag to false. This matches a comment of # deployment-prep still uses the old template. in the Puppet code as well.

# deployment-prep still uses the old template.
$abuse_networks = network::parse_abuse_nets('varnish')
file { '/etc/varnish/blocked-nets.inc.vcl':
    ensure  => present,
    content => template('profile/cache/blocked-nets.inc.vcl.erb'),
    owner   => 'root',
    group   => 'root',
    mode    => '0444',
}
modules/network/functions/parse_abuse_nets.pp
# SPDX-License-Identifier: Apache-2.0                                           
# @summary this function parses a Hash of Network::Abuse_net objects and
#          returns a list of networks appropriate for the context
# @param abuse_nets a list of abuse networks with meta data indicating
#                   where they should be used.
#                   Default: lookup('abuse_networks')
# @param context either ferm or varnish to indicate where the list will be used
function network::parse_abuse_nets(
     Network::Context                    $context,
     Hash[String[1], Network::Abuse_net] $abuse_nets = lookup('abuse_networks'),
) >> Hash[String[1], Network::Abuse_net] {
    $abuse_nets.filter |$key, $values| { $context in $values['context'] }
}
profile/templates/cache/blocked-nets.inc.vcl.erb
<%#- SPDX-License-Identifier: Apache-2.0 -%>
<%- @abuse_networks.each_pair do |net_name, config| -%>
<%- if config.has_key?('comment') -%>
  <%- config['comment'].split("\n").each do |line| -%>
// <%= line %>
  <%- end -%>
<%- end -%>
acl <%= net_name %> {
  <%- config['networks'].each do |net| -%>
        "<%= net %>";
  <%- end -%>
}
<%- end -%>

This ends up generating a file that looks something like:

/etc/varnish/blocked-nets.inc.vcl
acl blocked_nets {
        "14.0.0.0/8";
        "37.0.0.0/8";
        "38.0.0.0/9";
        "38.128.0.0/10";
        "38.192.0.0/11";
        "38.224.0.0/12";
}

The simplest thing to do here might just be adding a new allowed_nets acl and some boolean logic like:

if (std.ip(req.http.X-Client-IP, "192.0.2.1") ~ blocked_nets && !(std.ip(req.http.X-Client-IP, "192.0.2.1") ~ allowed_nets)) {
    return (synth(403, "Requests from your IP have been blocked, please contact noc@wikimedia.org"));
}
[21:25:31] <bd808>	 Before I waste energy with a poor implementation, could some of you fine folks look at my idea for making it easier to poke holes in the blocked_nets setting we are using in Beta Cluster? https://phabricator.wikimedia.org/T393481#10802783
[21:25:56] <bd808>	 This would be changes to wikimedia-frontend.vcl.erb
[21:26:38] <bd808>	 feature flagged somehow of course
[22:38:26] <wikibugs>	 06Traffic, 06DC-Ops, 10ops-esams, 06SRE: lvs3009 NIC HW issue (Broadcom, eno8303) - https://phabricator.wikimedia.org/T393616#10806152 (10RobH) url provided by support so i've uploaded the support collection report for their review
[22:53:54] <wikibugs>	 06Traffic, 10Beta-Cluster-Infrastructure: Add allowlist to make poking holes in abuse_networks:blocked_nets:networks easier - https://phabricator.wikimedia.org/T393481#10806169 (10BCornwall)
[22:59:49] <brett>	 bd808: I'm not the authority on this stuff here but it seems like the reasonable way forward to me.
[23:00:33] <brett>	 From a procedure standpoint, though... who's going to be maintaining this list?
[23:11:05] <bd808>	 brett: me and others in the deployment-prep project. We use Horizon to inject hiera settings for the project.
[23:11:25] <brett>	 makes sense
[23:11:59] <bd808>	 https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep#Blocking is the thing this would help with
[23:13:07] <bd808>	 bots crushed things such that I blocked ~12% of the IPv4 space and have been slowly poking holes for real people who ask. The CIDR math is tedious so we would like to add in an allow list basically.
[23:21:50] <brett>	 Yeah, seems reasonable to me. But I can't speak for v.gut or s.uk

[08:28:40] <vgutierrez>	 bd808: right now the easiest way would be adding a hiera hiera that injects the allowlisted CIDRs on the wikimedia_nets acl
[08:28:54] <vgutierrez>	 s/hiera hiera/hiera key/
[08:29:37] <vgutierrez>	 don't use wikimedia_trust cause those are allowed to set X-F-P

The wikimedia_nets acl mentioned by @Vgutierrez is an interesting option. The current code creating that acl includes a set of subnets that are allowlisted for Wikimedia Enterprise usage. That is basically the same sort of use case we have in Beta Cluster where we want to let some networks carrying presumed friendly traffic in no matter what other blocks we create.

Adding something like $extra_nets to ::profile::cache::base mirroring the current $extra_trust parameter there could then be concatenated into the $wikimedia_nets list that is derived from $::network::constants::aggregate_networks. That $wikimedia_nets list is pulled in by ::profile::cache::varnish::frontend and passed to varnish::instance { "${cache_cluster}-frontend": } where it is passed to varnish::wikimedia_vcl { "/etc/varnish/wikimedia_${vcl_name}.vcl": }. ::varnish::wikimedia_vcl exposes it to the wikimedia-frontend.vcl.erb template where it is used to populate the wikimedia_nets acl config. Finally the wikimedia_nets acl would be used in place of the allowed_nets acl imagined in T393481#10802783.

The business logic that needs to be changed has moved recently as part of a larger project to improve blocking capabilities in the production CDN. The blocking action is still found in Varnish, but it is now triggered by the presence of a X-Provenance: abuse=blocked_nets header in the request.

That header is added to the inbound request by HAProxy based on various lookups. In the case of Beta Cluster specifically the /etc/haproxy/ipblocks.d/all.map file is generated by Puppet from the abuse_networks Hiera data:

profile/templates/cache/haproxy/ipblocks-all.map.erb
<%#- SPDX-License-Identifier: Apache-2.0 -%>
<%- @abuse_networks.each_pair do |net_name, config| -%>
<%- config['networks'].each do |net| -%>
<%= net %>      abuse=<%= net_name %>
<%- end -%>
<%- end -%>

I wondered if it all might be as easy as adding another section to the abuse_networks config that would end up generating a different abuse=... value. With the existing Puppet code this does change the generated ipblocks-all.map file, but not in a way that would be guaranteed to prevent blocking.

Adding a feature flagged ip map lookup from a new data file just before the current http-request set-var(req.provenance,ifnotexists,ifnotempty) src,map_ip(/etc/haproxy/ipblocks.d/all.map) lookup in templates/cache/haproxy/tls_terminator.cfg.erb should work though. That might look something like:

templates/cache/haproxy/tls_terminator.cfg.erb
<%- if @use_allowlist -%>
# T393481: check if the IP is in the allowed networks list
http-request set-var(req.provenance,ifnotexists,ifnotempty) src,map_ip(/etc/haproxy/ipblocks.d/allowed.map)
<%- end -%>
# check if the IP is included in one of our ipblocks
http-request set-var(req.provenance,ifnotexists,ifnotempty) src,map_ip(/etc/haproxy/ipblocks.d/all.map)
/etc/haproxy/ipblocks.d/allowed.map
1.2.3.4/36 net=allowed
...

The core magic here is that the config is using set-var(req.provenance,ifnotexists,ifnotempty) consistently so that an ordered list of lookups populates it with a first-match-wins strategy. By setting the var to something other than abuse=blocked_nets the block that would otherwise be created by a match in /etc/haproxy/ipblocks.d/all.map is avoided.

Drive-by comment that the ./whoisit.sh utility mentioned in https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Blocking_and_unblocking#Unblocking_an_IP_or_network is using bgpview.io which shut down recently, in case anyone wants to find a replacement

Drive-by comment that the ./whoisit.sh utility mentioned in https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Blocking_and_unblocking#Unblocking_an_IP_or_network is using bgpview.io which shut down recently, in case anyone wants to find a replacement

Thanks for the poke. I updated the script and wiki docs to use whois -h bgp.tools "$*" behind the scenes. Hopefully this one will keep working for a while longer.