So far, we have primarily been using rate limiting to protect API end points from abuse. In practice, we have found that rates are not the best thing to focus on, both from the user & our perspective. In this task, I am making the case for focusing on request concurrency instead.
For API users, rate limits are difficult to understand and implement. Most clients are implemented as one or more basic loops iterating over URLs. The actual rate of requests made can change drastically based on response times and network conditions, which is hard for clients to predict. Implementing actual rate limiting and -pacing is non-trivial, and I believe very rare in practice. Because they are implemented as simple loops, the vast majority of our clients actually already limit concurrency, and not rates.
On the server side, the main thing we care about is limiting the resources a single client can tie up. The time (and thus CPU / memory / IO resources) needed to serve individual API requests can differ by several orders of magnitude. Some requests can be very cheap when served from caches, but a lot more expensive when not in cache. Even within a single API entry point like the one we expose for ad-hoc wikitext parsing, costs can differ wildly depending on inputs. However, to a first approximation, each concurrent request is tying up a relatively similar amount of resources while it is being processed. This means that the request concurrency per client approximates the associated resource usage a lot better than request rates.
Additionally, concurrency-limited clients will automatically slow down during times of temporarily elevated latencies, which helps reduce load when it is most expensive to our infrastructure.
Implementing concurrency limiting
Our nginx / varnish layer is critical for performance and reliability. For this reason, we would much prefer making limiting decisions using local nginx / varnish state only, avoiding dependencies on other services subject to failures and network latency.
- Nginx has concurrency limiting on arbitrary, templated keys (IP, header, etc)
- Problem: Would need to map all connections from a given API user to the same Nginx instance for effective rate limiting. Can't do this with LVS if we don't want to be limited to limiting on IPs.
- Varnish handles all analytics needs, so limiting in Varnish would make it easier to accurately capture limiting in analytics.
Idea: Balance nginx->varnish connections by app key hash in Nginx; concurrency limit in front-end Varnish.
- Load balance backend connections from Nginx to Varnish by app-level $client_id; don't use LVS
- ngx_http_upstream_consistent_hash module: consistent hash load balancing on arbitrary key
- https://github.com/weibocom/nginx-upsync-module: adds dynamic backends from etcd
- Could later use auth JWTs to verify client ids & derive a reliable key
- Implement concurrency limiting in Varnish. Options:
- Extend vsthrottle module with ability to return tokens at end of request, as discussed in this task. This looks fairly straightforward to implement.
- Create a simplified counter module loosely modeled on vsthrottle, based on atomic counters & a periodic GC process. Should offer better performance, but might be YAGNI.
- Potential issues:
- Using app-level Nginx -> Varnish load balancing:
- Need to make sure that backend monitoring is at least on par with LVS.
- Would need integration with dynamic pooling / de-pooling workflow. This is well established for LVS.
- Keeping "client keys" of any sort secure in applications that run on an end user's machine is an unsolved problem. See details in T167906#3412015.
- Using app-level Nginx -> Varnish load balancing: