Page MenuHomePhabricator

Block non-browser requests that use generic user agent (UA) headers
Open, Needs TriagePublic

Description

When we want to modify or deprecate APIs, it is useful to know who is using it. Since we do not require any kind of authentication to use our APIs, the only way is often to look at the User-Agent header. This however only works if the User-Agent header is set to a useful value, rather than a generic library name.

We have required the User-Agent to be set to a useful value since 2010, but this was never really enforced. The only way to get clients to provide a useful UA string appears to be by blocking generic UAs.

Some examples:

  • "-": ~1300/sec
  • "Ruby": 100/sec
  • "curl/" prefix: 240/sec
  • "okhttp/" prefix: 240/sec
  • "MyApp/01": 1/sec (example value from the LWP manpage). This isn't a lot, but it seems to be the primary user of /api/rest_v1/page/pdf/, which we want to deprecate.

These requests should be blocked with a helpful error message pointing to the policy page.

NOTE: If we block the generic curl UA, we'll probably block our own manual debugging calls. The error message returned to the user should include instructiosn for setting up a .curlrc file to avoid this.

Event Timeline

daniel added a subscriber: Joe.
daniel updated the task description. (Show Details)
daniel updated the task description. (Show Details)
daniel renamed this task from Block non-browser requests that use generic agents to Block non-browser requests that use generic user agent (UA) headers.Nov 9 2022, 5:39 PM

FWIW we're banning more generic UAs via dynamic requestctl rules; our rule of thumb is to start rate-limiting requests from a specific UA only when it starts creating an issue to the infrastructure. In general, banning generic UAs will have the effect to force people to either identify themselves, or use browser-like UAs instead.

FWIW we're banning more generic UAs via dynamic requestctl rules; our rule of thumb is to start rate-limiting requests from a specific UA only when it starts creating an issue to the infrastructure. In general, banning generic UAs will have the effect to force people to either identify themselves, or use browser-like UAs instead.

I would assume that most of the requests are made in good faith, so block would lead to more clients supplying contact info. The block message should of course link to the policy page.