Page MenuHomePhabricator

Implement system-wide overall request rate / resource consumption limits in the orchestrator
Open, Needs TriagePublic

Event Timeline

@Jdforrester-WMF what is the rationale for having a system-wide request-rate limit?

@Jdforrester-WMF what is the rationale for having a system-wide request-rate limit?

Security and SRE were worried about out-of-control processes in Wikifunctions adding significant load to the production machines (especially if e.g. the calls from the orchestrator to Wikifunctions.org's MW API, or Wikidata.org's, got into a runaway loop) and could imperil the overall cluster stability.

Check with Security if it is OK to simply use the Docker rate limits, instead of implementing our own solution. Contingent on that, moving this to nice to have.

James's estimated effort: 10+ days? Mostly blocked by tracking data about usage in the orchestrator.

Are there any updates on this? What resources we want to track, strategies for avoiding the runaway cases mentioned by James?

Are there any updates on this? What resources we want to track, strategies for avoiding the runaway cases mentioned by James?

This is mostly moot until we have external resources we're requesting (i.e. Wikidata and Commons); I suppose we could build it as-is for requests to Wikifunctions's wiki API for now?

In my mind we'd just have all requests go through a request proxy, either just using the existing network service (which I think we have to use anyway?) or more smartly some code inside the orchestrator which keeps a running track of request rates and rejects/slows down requests over some threshold?