Page MenuHomePhabricator

Implement Varnish-level rough ratelimiting
Closed, ResolvedPublic

Description

We've tried this before using libvmod-tbf during the Varnish3 era, and ended up backing it out over some pragmatic issues. The key issues were that:

  1. VCL reloads (which are very routine and constant) were leaking significant memory (in general, but specifically in the libvmod_tbf case it was the large TBF datasets)
  2. The TBF algorithm was structured such that the initial state on server start was "all buckets full" (no burst capacity), as opposed to assuming (at server start, or at seeing a new IP) that buckets are initially empty (in the virtual sense)

We need to resurrect this effort and get it deployed without such issues so that we can build other solutions on it.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We should evaluate vmod_vsthrottle, available in the varnish-modules package and see if it's affected by the issues mentioned in this ticket (memory leaks, no burst capacity).

re: vsthrottle, my thoughts after a quick look this morning:

  1. The burst issue seems fine. It initializes fresh buckets with full capacity.
  2. Memory leaks - it seems to have proper fini that looks right. If it leaks memory, it's just because varnish leaks vmod memory when it shouldn't in general, so that problems out of scope here regardless (if we see this problem, we can raise it as a varnish-level issue and potentially block this).
  3. Lock contention - vsthrottle looks a little heavy on the locking. It's notable they partition the keyspace into 16 chunks to split up the mutex contention, and this is a compile-time constant. I think we could live with this, so long as we use this only for the case that really matters, which is ratelimiting FE cache misses+passes rather than hits. Worst case we might need to tweak the compile-time constant to use more chunks.

Beyond all that though, I think the interface/functionality is a bit deficient. Going over some of those points:

  1. There are only two parameters for a bucket: period and limit in this tbf implementation, but not burst. I think, off the top of my head, that we can still configure all the same scenarios as a ratelimiter with a separate burst parameter, but we might have to change the period to be much longer to match the burst. E.g. instead of saying "50/s + 500 burst", we'd have to state it here as "500/10s". I should probably stare at the iptables implementation again and see if it makes a real difference we can't configure our way around (e.g. does iptables 50/s+500b still expire the entries after one second of inactivity, or after 10? that might be the sort of efficiency difference we might see here).
  2. The period and limit are specified in every call, and are part of the bucket's key. So if we want to use the same bucket from multiple places, we have to be careful to keep the parameters in sync or perhaps use a subroutine for a shared bucket (DRY issue, but we have lots of those in VCL anyways).
  3. There's no ability to increment by multiple tokens. Every is_denied() call checks the limit and consumes 1 token. There's no way to ask it to consume 10 tokens from the same shared bucket for certain requests because they're expensive.

Stared at hashtable implementation some more, as well as the linux iptables hashlimit one (which I consider a sort of baseline canonical efficient implementation). The linux one is definitely more-configurable, but I think we can work with what vsthrottle gives us in terms of configurability of the bucket itself. The lack of a cost parameter is still an issue, but it's a long-term issue if/when we get to the point where we define a header that applications can send in their response to indicate heavier costs.

So, let's put up some commits and see what using it would look like? Ideally just for miss/pass (e.g. from vcl_backend_fetch), and able to return a 429 (which is tricky from there, as return (abandon) results in a synth with a pre-set 503 status?).

Change 357995 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: rate limit wikiScrape with vsthrottle

https://gerrit.wikimedia.org/r/357995

Change 357995 merged by Ema:
[operations/puppet@production] VCL: rate limit wikiScrape with vsthrottle

https://gerrit.wikimedia.org/r/357995

Change 358583 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: rate limit API requests

https://gerrit.wikimedia.org/r/358583

Change 358583 merged by Ema:
[operations/puppet@production] VCL: rate limit text-frontend requests

https://gerrit.wikimedia.org/r/358583

Change 358774 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: increase rate limit to 600/60s

https://gerrit.wikimedia.org/r/358774

Change 358774 merged by Ema:
[operations/puppet@production] VCL: increase rate limit to 600/60s

https://gerrit.wikimedia.org/r/358774

Change 358779 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: do not rate limit requests from IPs in wikimedia_trust

https://gerrit.wikimedia.org/r/358779

Change 358779 merged by Ema:
[operations/puppet@production] VCL: do not rate limit requests from IPs in wikimedia_trust

https://gerrit.wikimedia.org/r/358779

Change 358860 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] VCL: raise ratelimit for RB, exclude labs from limits

https://gerrit.wikimedia.org/r/358860

Change 358860 merged by BBlack:
[operations/puppet@production] VCL: raise ratelimit for RB, exclude labs from limits

https://gerrit.wikimedia.org/r/358860

Change 358965 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: add Retry-After header to 429 responses

https://gerrit.wikimedia.org/r/358965

The fact that no requests have been throttled of late in PageviewAPI (see 429 graph below) kind of tells me that PageviewAPI has received too few "lawful" requests as of late though. Are we excluding restbase services from this rate limiting? I think thus far it seems to be eating lawful traffic.

https://grafana.wikimedia.org/dashboard/db/aqs-elukey?orgId=1&from=1495950536774&to=1496747990508&panelId=7&fullscreen

Change 358965 merged by Ema:
[operations/puppet@production] VCL: add Retry-After header to 429 responses

https://gerrit.wikimedia.org/r/358965

Change 359137 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: same rate-limit for api.php and restbase

https://gerrit.wikimedia.org/r/359137

Change 359137 merged by Ema:
[operations/puppet@production] VCL: same rate-limit for api.php and restbase

https://gerrit.wikimedia.org/r/359137

Change 359231 had a related patch set uploaded (by BBlack; owner: BBlack):
[operations/puppet@production] ratelimits: double the default anon limit

https://gerrit.wikimedia.org/r/359231

Change 359231 merged by BBlack:
[operations/puppet@production] ratelimits: double the default anon limit

https://gerrit.wikimedia.org/r/359231

Change 359401 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] VCL: apply API rate limits to wikidata too

https://gerrit.wikimedia.org/r/359401

Change 359401 merged by Ema:
[operations/puppet@production] VCL: apply API rate limits to wikidata too

https://gerrit.wikimedia.org/r/359401

ema claimed this task.

We've been using vsthrottle in prod for a while now, closing.