Page MenuHomePhabricator

Create Generalised blocking stratagy
Open, MediumPublic

Description

We should try and centralise and standardise how we block, rate-limit and filter traffic across our infrastructure.

We currently have blocking implementations using

  • apache
  • varnish
  • iptables
  • routers
  • CDN provider
  • other???

Further to having multiple different implementations for blocking we have multiple locations where the block lists exist. for apache the lists are mostly scattered around the public and private repo. The iptables rules are mostly managed by the ferm::rule puppet resource and can be mostly queried using the puppet db. Varnish has a standard way to block ip addresses using the abuse_networks block in the private repo and also has many add hoc filters and ratelimits for specific endpoints or agents known to cause issues.

The multitude of options and places where data can be blocked can make it difficult for engineers and service owners to know:

  • how to add a block
  • how/where to check current blocks
  • which blocks apply to which services
  • where in the stack a block should be added

My personal view is that we should filter traffic behind the caches at the caching layer. If a service is not behind the caching layer then it is likely that it is because it is either a critical service where shared fait is to be avoided or the services is not production. In both of theses cases i think dropping the traffic at iptables without providing a message is sufficient and potentially desirable i.e. fail fast. If this is viewed as too aggressive then i think we should look to see if we can implement some type of standardise blocking logic in envoy as this seems to be the preferred TLS termination layer and would avoid us having to implement blocks in apache, nginx, tomcat etc. However we implement the block though i think the data about what, where and how something is being block should be centralised

This topic came up when discussion the blocking strategy for phabricator abuses. It seems that historically this has been handled in apache, i believe this was implemented before phabricator was behind the caching layer. however now that phabricator is behind the caching layer it can and does make use of a block list in varnish. That said the current block list in varnish is a rather large hammer as it blocks a user to everything behind the caching layer. So i wonder if we should start adding some more granularity to the block lists in varnish so that we could block a user to phabricator (or possibly all developer tools) without blocking access to all WMF resources.

Something elses to consider is how services owners can at the very least see the current block list ideally with some context as to why a user/agent/ip was blocked. potentially an easy way to add users to this list. I believe historically adding users to a block list would always require ops access so this would be additional behaviour; however at least in the phabricator instance viewing the block list with context was possible to to the fact that the block list is stored (potentially incorrectly) in the public puppet repo

I made some steps towards creating a more centralised structure to bocking earlier this year, creating a new hiera block in yaml called abuse_networks. This block is currently used by varnish and ferm and could also be fairly trivially be adapted for use with other daemons managed by puppet such as Apache or to add additional service specific block lists to varnish

The abuse_networks block is however far from perfect:

  • its not usable by anything that is not puppet
  • its not easy to expose the information (to NDA users) or more specifcly the context around specific entries (IP's)
  • requires commit access to the puppet private repo to add entries
  • its maintained in a flat yaml file which also maintains a bunch of important other stuff, all of which would be a pain to break during an incident (well anytime more more so ...)

Event Timeline

Change 651171 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] varnish: add phabricator specific ban in varnish

https://gerrit.wikimedia.org/r/651171

jbond updated the task description. (Show Details)
jbond added a subscriber: MoritzMuehlenhoff.

Change 651174 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] varnish: migrate abuse_nets acl to abuse_networks hiera block

https://gerrit.wikimedia.org/r/651174

Volans triaged this task as Medium priority.Dec 21 2020, 1:39 PM
jbond updated the task description. (Show Details)

Change 651174 merged by Jbond:
[operations/puppet@production] varnish: migrate abuse_nets acl to abuse_networks hiera block

https://gerrit.wikimedia.org/r/651174

Change 651171 merged by Jbond:
[operations/puppet@production] varnish: add phabricator specific ban in varnish

https://gerrit.wikimedia.org/r/651171

Thanks for the writeup with all the background! And for the cleanup patches so far :)

Just a few things to add.

Re: the schema of blocks themselves:

  • In addition to being able to specify a particular logical 'service' to which the block/limit applies, I think there should also be two fields (both probably mandatory):
    • a comment/note -- as part of the structured data rather than as a YAML comment. This means we could also display it on the config-master dump. It should probably be customary here to include a Phab task number here for long-standing blocks.
    • an expiration date. (I do not think we should automatically unblock when the expiration date passes; I think instead there should be an IRC alert or email or a ticket opened, so someone can go look.)

There also some open questions as to what it means to specify a particular "service"; for now I think having a few strings we support ('traffic' vs 'phab' vs 'gerrit' etc) might be okay?

In future work we'd want to formalize that list of "services" or "block points" more, and probably move the list out of private hiera into some other data store (which we could then put an SSO-protected webapp around?)

Change 674922 had a related patch set uploaded (by Ema; author: Ema):
[operations/puppet@production] varnish: add phabricator_abusers to traffic cloud project

https://gerrit.wikimedia.org/r/674922

Change 674922 merged by Ema:
[operations/puppet@production] varnish: add phabricator_abusers to traffic cloud project

https://gerrit.wikimedia.org/r/674922

Change 674923 had a related patch set uploaded (by Ema; author: Ema):
[operations/puppet@production] varnish: add text_abuse_nets to traffic cloud project

https://gerrit.wikimedia.org/r/674923

Change 674923 merged by Ema:
[operations/puppet@production] varnish: add text_abuse_nets to traffic cloud project

https://gerrit.wikimedia.org/r/674923