Last night and the night before, we were paged for network saturation due to what looks like a hotlinked image.
Ideally, we'd like to be able to serve that kind of traffic, but given our current resources and as an emergency measure, requestctl should be able to throttle it, in order to keep our bandwidth within limits. Unlike the use case where requestctl protects the app servers from a cache-busting attack, here the problematic traffic is all cache hits.
Currently, requestctl's generated VCL filters are only invoked in cluster_fe_ratelimit, which is only called from cluster_fe_miss or cluster_fe_pass. There's no entry point from vcl_hit (wikimedia-frontend.vcl.erb) so requestctl can't affect traffic on the hit path.
Note that the requestctl schema does contain a field cache_miss_only, which is documented to mean that cache hits are filtered when the field is set false -- but it was added optimistically and currently doesn't do anything (and can't, until the control flows are changed as above). If we can't figure out a way to actually implement this behavior in the near term, we should remove that field or at least add a prominent warning label.
- Create placeholder-for-now cluster_fe_{ratelimit_,}hit subroutines in our VCL https://gerrit.wikimedia.org/r/c/operations/puppet/+/832268
[ ] Patch requestctl to check the value of internal cache disposition header req.http.X-CDIS against the action's cache_miss_only as part of evaluation. Within our VCL, this will be hit on a hit, miss on a miss, etc.
- Release requestctl.
- Add an optional stanza, inclusion controlled by the value of a new hiera variable, to *-frontend.inc.vcl.erb's cluster_fe_ratelimit_hit sub, that just's a include "requestctl-filters.inc.vcl"; as already happens in cluster_fe_ratelimit.
- Enable that hiera bit on a few upload cps in eqsin as an initial evaluation and some protection against that same image file hotlink.
- Enable that hiera bit on a few text cps, watching for any performance impact.
- Roll it out to the fleet.