Page MenuHomePhabricator

rest gateway: use IP for "compliant bots" rate limits
Open, HighPublic

Description

Currently, rate limits unauthenticated requests from clients that follow the user-agent policy use the value of the x-ua-contact header as the rate limit counter key. This assumes that the user-agent provides the contact info of the bot's operator. However, it's quite common to provide the contact info of the code author instead, e.g. for OpenRefine (see issues/7731).

Instead, we should use the client's IP address as the counter key. This would also prevent spoofing of the user-agent string to circumvent rate limits.

The downside is that it becomes harder to get a list of the user-agents of the top clients. But this data can be extracted from logs.

Event Timeline

Change #1268520 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[operations/deployment-charts@master] rest gateway: use IP as rate limit key for compliant bots

https://gerrit.wikimedia.org/r/1268520

daniel triaged this task as High priority.Wed, Apr 8, 8:14 AM

FWIW, I think this introduces both a discrepancy with the logic we adopt at the edge, and a matter of inequality: bots who control a large enough IP space would have a much larger limit than the bot of a community member, which is typically coming from a single source IP.

So you would be granting a bot from Oracle 1000x the limits you grant to a researcher.

That goes against both the spirit and the execution of our rate-limiting system.

Change #1268520 abandoned by Daniel Kinzler:

[operations/deployment-charts@master] rest gateway: use IP as rate limit key for compliant bots

Reason:

won't do, per discussion on the ticket

https://gerrit.wikimedia.org/r/1268520