Page MenuHomePhabricator

Rate-limit browsers without referers
Closed, ResolvedPublic

Description

https://wikimediafoundation.org/wiki/Maps_Terms_of_Use requires a valid HTTP User-Agent or Referer which identifies the source of maps requests.

A website is technically able to stop the referer header to evade blocks.

There are some legitimate use cases for Refererless browsers, but these are low traffic like loading a single tile in a browser and shouldn't hit any rate limits.

Implementing delay pools for User-Agents claiming to be browsers but not sending a Referer is how the OSMF implemented this: https://github.com/openstreetmap/chef/pull/79

It's better to implement this before its needed, even if the rate limits are high, because when its needed, it's needed urgently. There should also be fewer issues with setting false expectations if we have it set up this way from the start.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
debt triaged this task as High priority.Jan 6 2017, 10:42 PM
debt added a project: Maps-Sprint.
Gehel added subscribers: ema, BBlack, Gehel.

This is worth discussing with our Traffic team. @BBlack, @ema: what is your point of view on rate-limiting browser without referer?

Varnish is probably where it would make sense to implement such limit as we do expect most requests to be cached at the Varnish level. As far as I know, this is not something we currently do on Varnish and probably not something we want to do as it requires additional state on Varnish.

We do *some* rate limiting on WDQS at nginx level, more to prevent unchecked abuse than to have a hard rate limit. But WDQS queries are not expected to be cached. We do not have a component able to do rate limiting in the maps flow as far as I know.

Significant work has already be done on T163233. @ema is aware of this task and will come back to us with some idea / plan / or cancelation.

Gehel moved this task from Stalled/Waiting to Done on the Maps-Sprint board.
Gehel added a subscriber: mpopov.

After some discussion with @ema and @BBlack:

TL;DR - there's lots of fancy thoughts to have about the long term, but pragmatically there's not much we can do at all except what we're doing in all the other apps' cases: set fairly high per-IP ratelimits, *maybe* set them lower for empty/super-short UA strings as a warning, and anything more complicated is a year+ away when something or other is better-resourced and we can make loftier plans.

We probably want to do some analysis of what is a reasonable per-IP ratelimit for maps (@mpopov might be able to help), but we don't want to do anything more fancy at that time. See the IRC logs for the details of the conversation (15:50 to 16:42).

This can be closed, doing analysis of per-IP rate limit is done on T169175.