Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limiting
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	• GWicke
	Jun 14 2017, 6:29 PM

Description

So far, we have primarily been using rate limiting to protect API end points from abuse. In practice, we have found that rates are not the best thing to focus on, both from the user & our perspective. In this task, I am making the case for focusing on request concurrency instead.

For API users, rate limits are difficult to understand and implement. Most clients are implemented as one or more basic loops iterating over URLs. The actual rate of requests made can change drastically based on response times and network conditions, which is hard for clients to predict. Implementing actual rate limiting and -pacing is non-trivial, and I believe very rare in practice. Because they are implemented as simple loops, the vast majority of our clients actually already limit concurrency, and not rates.

On the server side, the main thing we care about is limiting the resources a single client can tie up. The time (and thus CPU / memory / IO resources) needed to serve individual API requests can differ by several orders of magnitude. Some requests can be very cheap when served from caches, but a lot more expensive when not in cache. Even within a single API entry point like the one we expose for ad-hoc wikitext parsing, costs can differ wildly depending on inputs. However, to a first approximation, each concurrent request is tying up a relatively similar amount of resources while it is being processed. This means that the request concurrency per client approximates the associated resource usage a lot better than request rates.

Additionally, concurrency-limited clients will automatically slow down during times of temporarily elevated latencies, which helps reduce load when it is most expensive to our infrastructure.

Implementing concurrency limiting

Our nginx / varnish layer is critical for performance and reliability. For this reason, we would much prefer making limiting decisions using local nginx / varnish state only, avoiding dependencies on other services subject to failures and network latency.

Nginx has concurrency limiting on arbitrary, templated keys (IP, header, etc)
- Problem: Would need to map all connections from a given API user to the same Nginx instance for effective rate limiting. Can't do this with LVS if we don't want to be limited to limiting on IPs.
- Varnish handles all analytics needs, so limiting in Varnish would make it easier to accurately capture limiting in analytics.

Idea: Balance nginx->varnish connections by app key hash in Nginx; concurrency limit in front-end Varnish.

Load balance backend connections from Nginx to Varnish by app-level $client_id; don't use LVS
- ngx_http_upstream_consistent_hash module: consistent hash load balancing on arbitrary key
- https://github.com/weibocom/nginx-upsync-module: adds dynamic backends from etcd
- Could later use auth JWTs to verify client ids & derive a reliable key
Implement concurrency limiting in Varnish. Options:
- Extend vsthrottle module with ability to return tokens at end of request, as discussed in this task. This looks fairly straightforward to implement.
- Create a simplified counter module loosely modeled on vsthrottle, based on atomic counters & a periodic GC process. Should offer better performance, but might be YAGNI.
Potential issues:
- Using app-level Nginx -> Varnish load balancing:
  - Need to make sure that backend monitoring is at least on par with LVS.
  - Would need integration with dynamic pooling / de-pooling workflow. This is well established for LVS.
- Keeping "client keys" of any sort secure in applications that run on an end user's machine is an unsolved problem. See details in T167906#3412015.

Related Objects

Mentioned In: T221162: System Adminstrator limits rate of MediaWiki REST API calls per client
T180080: MediaWiki API does not support parallel pagination
T178501: Limit resources used by Chromium in order to make the chromium-render service ready for production
T176627: Trial replacing Electron with headless Chromium in the render service
T118365: Increase request limits for GETs to /api/rest_v1/

Event Timeline

• GWicke created this task.Jun 14 2017, 6:29 PM

Restricted Application added a project: SRE. · View Herald TranscriptJun 14 2017, 6:29 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

• GWicke mentioned this in T118365: Increase request limits for GETs to /api/rest_v1/.Jun 14 2017, 6:32 PM

• GWicke added a project: TechCom-RFC.Jun 14 2017, 8:08 PM

Bianjiang subscribed.Jun 15 2017, 4:23 PM

• ema moved this task from Backlog to Caching on the Traffic board.Jun 16 2017, 11:18 AM

• Marostegui triaged this task as Medium priority.Jun 19 2017, 8:43 AM

It'd be nice if you define "API": are you talking about restbase and/or api.php and/or something else?

The last time this came up, I pointed out that it would be really bad if all of Tool Labs wound up being stuck into one concurrency bucket, or even into a limited number of concurrency buckets, as would happen with naive IP bucketing. I don't see any consideration of that issue here, instead you're simply proposing naive IP bucketing.

@Anomie, this RFC is primarily about better addressing the huge disparity in request costs by switching from *rate* limiting to *concurrency* limiting. The issue applies to APIs and other web requests, and there have been recent attempts to handle this generically at the text varnish layer.

The issue of lab IPs, or more generally the limitations of limiting on IPs, are orthogonal to this RFC, and out of scope.

You can't propose limiting concurrency based on IP while declaring that the drawbacks to such a scheme are "out of scope".

ORES is considering useragent-based connection limiting (we are looking into using PoolCounter for now). Varnish providing that and/or IP bucketing would be nice; just enabling it globally with a naive configuration would not be so nice. If I understand it correctly this proposal is just about how to provide the capability?

(FWIW I think the maxlag action API mechanism which just tells the client "please wait n seconds before your next request" is also easy to understand, easy to implement, and highly adaptive, although it only makes sense for expensive requests. The two approaches could be complimentary though.)

daniel moved this task from P1: Define to P5: Last Call on the TechCom-RFC board.Jul 5 2017, 6:48 PM

daniel moved this task from P5: Last Call to Request IRC meeting on the TechCom-RFC board.

This RFC is scheduled for today's IRC meeting, at 2pm SF time (15 minutes from now), on the #wikimedia-office channel. Sorry for the late notice on the task.

IRC meeting summary: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-07-05-21.03.txt

Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2017/wikimedia-office.2017-07-05-21.03.log.html

There was general support for introducing concurrency limiting, as an improvement over rate limiting as a general baseline mechanism. More targeted limiting, likely using rate limiting, will still be needed for end points that cause asynchronous costs, such as editing.

While the topic of identifying clients by IP, API token, or other means is not part of this RFC, there was a lot of interest in discussing it soon.

The next step will be to look into ways of implementing concurrency limiting at a high level (Nginx / Varnish). Since there is a lot of interest in moving towards API tokens or other means of authenticating clients at least for higher volume access, it would make sense to consider the possibilities of supporting such app-level schemes early on.

In T167906#3411791, @GWicke wrote:

Since there is a lot of interest in moving towards API tokens or other means of authenticating clients at least for higher volume access, it would make sense to consider the possibilities of supporting such app-level schemes early on.

I find comments like this confusing since the api.php end-point already supports both authentication and authorization for higher-volume access.

In T167906#3411800, @MZMcBride wrote:

In T167906#3411791, @GWicke wrote:

Since there is a lot of interest in moving towards API tokens or other means of authenticating clients at least for higher volume access, it would make sense to consider the possibilities of supporting such app-level schemes early on.

I find comments like this confusing since the api.php end-point already supports both authentication and authorization for higher-volume access.

As far as I can tell, the proposal here wants to add another layer on top of that which takes effect before api.php does its authentication and authorization, and/or it's ignoring api.php entirely and concentrating only on the perceived needs of restbase.

IMO the proposal continues to be far too optimistic in assuming the management and distribution of these "concurrency tokens" is a negligible technical detail when the similar problem with OAuth secrets and downloaded apps still has no really usable solution. Broadly, the problem is in satisfying all three of these bullets:

Tokens can't be included in downloaded apps, since malicious actors could download a trusted app and steal its tokens.
Getting a new token can't be too easy, or an app can bypass the concurrency limiting by automatically getting multiple tokens (and pretending to be multiple apps).
Non-technical end users can't be required to perform a complicated process to obtain tokens on their own, it would be user-unfriendly and vulnerable to phishing. Keep in mind that APIs are used by user scripts, apps from phone app stores, and so on.

IMO the proposal continues to be far too optimistic in assuming the management and distribution of these "concurrency tokens" is a negligible technical detail when the similar problem with OAuth secrets and downloaded apps still has no really usable solution.

The intention is actually to leave the meaty discussion of improving how we identify high volume API users in an efficient manner for later.

The goal of this RFC is to make pragmatic progress along another, independent axis. We found that global rate limits at the Varnish level are a blunt instrument, and the proposal is to improve on this by moving to global concurrency limits instead.

Yes, you've said that before. I have no idea how you plan to actually implement anything based on this RFC without running into these problems that you're trying to handwave away, but I'll leave you to it.

This task feels very "we should build a gun today and we can worry about what we're going to shoot or not in the future."

It seems misguided to gloss over how to identify unique clients and to gloss over how to measure the server resources utilized and expended for a specific client request. Gabriel points to a broader, vaguer question of "wouldn't it be great to have concurrency limiting in addition to rate limiting?" but this misses the point that the implementation details of such a concurrency limiter are important.

Many of the IRC discussion participants were quick to point out a lot issues, which I think is nice. What's less encouraging is that others seem somewhat eager to plow ahead regardless. I think it's appropriate to be wary of building a weapon to combat client requests when we don't have a clear means of identifying a client's identity or cost.

daniel moved this task from Request IRC meeting to In progress on the TechCom-RFC board.Jul 12 2017, 7:48 PM

Glaisher subscribed.Jul 26 2017, 12:33 PM

• GWicke updated the task description. (Show Details)Jul 28 2017, 7:19 PM

• GWicke updated the task description. (Show Details)

• GWicke updated the task description. (Show Details)Jul 28 2017, 7:26 PM

Anomie updated the task description. (Show Details)Aug 1 2017, 1:04 PM

• GWicke mentioned this in T176627: Trial replacing Electron with headless Chromium in the render service.Oct 2 2017, 5:32 PM

• bmansurov mentioned this in T178501: Limit resources used by Chromium in order to make the chromium-render service ready for production.Oct 20 2017, 12:00 PM

Anomie mentioned this in T180080: MediaWiki API does not support parallel pagination.Nov 9 2017, 3:58 PM

Krinkle moved this task from In progress to Under discussion on the TechCom-RFC board.Dec 22 2017, 12:47 AM

• mobrovac added a project: Platform Team Legacy (Designing).Dec 20 2018, 12:55 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 9:19 PM

Tgr mentioned this in T221162: System Adminstrator limits rate of MediaWiki REST API calls per client.Apr 19 2019, 5:08 PM

@EvanProdromou Are you interested in looking into this from a product perspective? Is this something we need/want?

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJul 22 2019, 4:48 PM

with no meaningful update in over 2 years...should this be declined?

CDanis subscribed.Jan 16 2020, 9:48 PM

Last meaningful update Jul 7 2017, 12:10 AM....it is now 2020. I'm closing as declined and someone can resurrect if they see fit :)

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptJan 21 2020, 4:48 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:28 PM

Make API usage limits easier to understand, implement, and more adaptive to varying request costs / concurrency limitingClosed, DeclinedPublicActions