Page MenuHomePhabricator

Separate ingress IPs and/or infrastructure for large content uploads
Open, LowPublic

Description

During the incident which lead to T278943 and T278945 (TL;DR - excessive rate of high-res video uploads causing scaling issues at the jobq/videoscaler layer of things), one observation I had was that large content uploads are probably sufficiently-different in nature from our standard edge traffic patterns that they warrant some separation and perhaps even separate implementation tradeoffs, if possible. To expound on that a little:

Our standard traffic ingress via the L4LB (ipvs/pybal) to cpNNNN uses direct routing and its whole design and bandwidth model is built on assumptions that our ingress traffic rate is typically much smaller than our egress. This is true of most traffic, which is typically small HTTP requests and larger HTTP responses, or even when it's POSTs and uploads of various kinds, they're of reasonable size and/or they add up to a small fraction of total requests and thus blend into the background noise just fine.

However, there are also "users" which may have high bandwidth and low latency towards our edges (e.g. co-located in the same facilities as our core DCs) which may attempt to legitimately upload very large content (think giant 4K videos) at a very high bitrate, possibly even saturating our interfaces at some layer(s). Ratelimiting the ingress side of a TCP connection is much harder than the egress (we can drop on reception, but the saturation damage is already done by the time we drop), and such saturations can harm a lot of other unrelated traffic flows through the same infrastructure.

Fixing this out to the L4LB layer will take some work and probably needs to be a real "project" with some design thinking and resource allocation behind it - this task serves as an idea placeholder for now, for lack of a better tool! Some of the concerns here may feed back into the ongoing L4LB re-design work, though.

My initial thought is that it's going to be very difficult to effectively address this so long as the hostname and thus IP address used for large-content uploads is the same as the wiki project hostname used for all other traffic, and so a good first step might be to explore how difficult it would be, at the projects/mediawiki/etc level, to define a separate hostname for potentially-high-bandwidth uploads and get users to commonly use it. The new hostname could initially alias the same IPs and infrastructure we have today, but it would at least allow the future flexibility to look for solutions that separate the traffic flows.

Event Timeline

BBlack created this task.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!