Generic strategy to deal with high volume / expensive traffic from cloud providers
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Gehel
	Jan 12 2023, 9:00 AM

Description

We regularly see large traffic increase from AWS (or more rarely from other cloud providers). This is particularly problematic for services that are somewhat expensive, for example full text search or SPARQL queries (but other services might be impacted in similar ways).

As an example, we've seen a doubling of the full text search traffic, almost overnight, at the end of December (see T326757). This traffic increase can be mostly attributed to traffic coming from AWS.

Cloud providers makes it very easy to overwhelm our services, which has impact on our ability to serve other requests. While our services are meant to be freely available to all purpose, we need to protect the stability of our services and ensure equitable access to all.

So far, both for Search and for WDQS, the Search Platform team has been dealing with those surge in traffic inside of our applications (with dedicated pool counters for Search or with temporary ban of traffic for WDQS). This seems like a more generic technical solution might be needed, and a more generic policy on how we want to deal with those traffic surges.

Acceptance criteria:

general guidance on how to deal with traffic surges from cloud providers
decision on whether we want a generic solution to manage traffic surges from cloud provider or if we want to deal with those at application level

Related Objects

Mentioned Here: T326757: Investigate doubling of full_text search query rate since jan 1, 2023

Event Timeline

Gehel created this task.Jan 12 2023, 9:00 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 12 2023, 9:00 AM

Maintenance_bot added a project: SRE.Jan 12 2023, 9:29 AM

bking subscribed.Jan 12 2023, 2:07 PM

EBernhardson subscribed.Jan 12 2023, 6:38 PM

Maryana subscribed.Jan 13 2023, 4:10 PM

Gehel moved this task from needs triage to Current work on the Discovery-Search board.Jan 16 2023, 4:03 PM

Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.

JArguello-WMF moved this task from Incoming to Radar on the API Platform board.Jan 17 2023, 3:50 PM

Gehel moved this task from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.Jan 23 2023, 4:43 PM

BCornwall moved this task from Backlog to Radar/Not for service by Traffic on the Traffic board.Mar 28 2023, 9:01 PM

VirginiaPoundstone moved this task from Radar to API Platform Roadmap on the API Platform board.Apr 26 2023, 1:24 PM

VirginiaPoundstone edited projects, added API Platform (API Platform Roadmap); removed API Platform.

Gehel edited projects, added Discovery-Search; removed Discovery-Search (Current work).May 1 2023, 3:22 PM

Gehel moved this task from needs triage to Ops / SRE on the Discovery-Search board.

Generic strategy to deal with high volume / expensive traffic from cloud providersOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

Generic strategy to deal with high volume / expensive traffic from cloud providers
Open, Needs TriagePublic
Actions