Page MenuHomePhabricator

Reverse proxy supporting XFF-based per-IP concurrency limit and request queueing
Closed, DeclinedPublic

Description

In order to make a true per-IP limit for Thumbor with no locking secondary effects, we would need an additional piece of functionality that haproxy doesn't provide.

The desired properties are:

  • the ability to limit the amount of Thumbor workers/connections dedicated to a particular client, based on the X-Forwarded-For header
  • the ability to have a FIFO request queue per client for requests beyond the concurrency limit
  • a queue size limit to error immediately when a client sends too many requests (with possibly a concurrency penalty too)

Most reverse proxies only support erroring when a client goes over a certain limit, not queueing requests for that client. While haproxy has a request queue for its general pool of incoming requests, it cannot be controlled. Lua scripting in haproxy allows to error but not queue requests based on the criteria defined above.

As simple as the requirements above sound, I have so far been unable to find an existing FLOSS service that can support this feature set.

Event Timeline

RLazarus triaged this task as Medium priority.May 19 2020, 7:08 PM
RLazarus added subscribers: RLazarus, Joe.

Agree we ought to do this, and I think it's something Envoy can do nearly out of the box. If there's an element Envoy doesn't support, I agree that it's the over-limit FIFO queue. (Out of all the reverse proxies we could use, I'm focusing on Envoy because we're actively deploying it in other parts of the infrastructure, and if it's feasible I think we'd like to use it here too for consistency.)

I'm marking this Medium priority for now, but I'm not sure when serviceops will have cycles to spend on it, especially if it requires extending Envoy (feasible, but we haven't done it at WMF before) and especially given that we're also looking at moving Thumbor to Kubernetes, which might affect what we need from the proxy.

akosiaris subscribed.

Removing SRE, has already been triaged to a more specific SRE subteam

Joe added a subscriber: hnowlan.

While this task is definitely too big for sprint week, I also think the chances of something like this happening are minimal. Thus declining the task. @hnowlan let me know if you think I'm being unfair.