Page MenuHomePhabricator

Impending load test
Closed, ResolvedPublic

Description

Hi there, I'm the owner of the Wikipedia service on Amazon Alexa

as a heads up, we'll be conducting load tests over the next couple of nights. This is expected to result in dramatically increased traffic being directed to your API for retrieving wiki summaries.

Feel free to reach out if you have any questions.

Event Timeline

Can you advise what API queries you're actually making?

And any suggestion of magnitude?

This will be the general format of the vast majority of calls:

https://en.wikipedia.org/w/api.php?action=query&format=json&redirects=&continue=&prop=extracts%7Cpageimages%7Cpageprops&pithumbsize=300&exlimit=1&exintro=&explaintext=&exsectionformat=plain&titles=Barack+Obama

we actually haven't been given the numbers yet, but I expect us to handle a few million requests over the course of a couple of hours. We only make calls to you on cache miss, so all of the traffic won't roll over, but from what we saw last night we tend to cache miss on 3-9% of requests.

What kind of User-Agent will you be using? Please have a look at the API Etiquette and especially the User-Agent section. Do you know the IP range from which you will be hitting our API?

How many req/s are you planning to do at peak, and with what concurrency?

You could consider using https://en.wikipedia.org/api/rest_v1/#!/Page_content/get_page_summary_title instead, which is a fully cached version of the page summary information. This is significantly cheaper for us to serve.

we actually haven't been given the numbers yet, but I expect us to handle a few million requests over the course of a couple of hours. We only make calls to you on cache miss, so all of the traffic won't roll over, but from what we saw last night we tend to cache miss on 3-9% of requests.

Plugging some math through this: if we assume 3M reqs over 2 hours and a 10% cache miss rate on your end, that's going to send an averaged rate of 41 reqs/sec to our servers, or 2460 reqs/min, which is in the ballpark of 0.1% of our total daily average reqs/sec for text (as opposed to multimedia) traffic.

For the API you're using, we currently have a per-client-IP ratelimiter in place that will limit at 600 reqs/min/clientip, which is designed to help keep bulk API users from sucking up large fractions of the resources we intend for direct user agents. The guestimated rate above is about 4x the limit, so unless you're sourcing this from 5 or more distinct IPs on your end, you'll probably run into 429 error codes from the ratelimiter.

Marostegui subscribed.

It would be helpful also if you could provide the start/end hour of your tests so we can identify those in our graphs.

Run time will be Thursday 1800EST to Friday 0600EST,

we're looking into updating our user agent now

You could consider using https://en.wikipedia.org/api/rest_v1/#!/Page_content/get_page_summary_title instead, which is a fully cached version of the page summary information. This is significantly cheaper for us to serve.

I'm investigating this

Run time will be Thursday 1800EST to Friday 0600EST,

we're looking into updating our user agent now

For the record in UTC that is: Thursday 11PM to Friday 11AM

Actually, you missed it. In truth, after looking at the APIs that you had referenced, a bunch of discussion, we backed down the majority of our load testing obligations from last week. We'll be implementing your suggestions (using the preferred API and providing a more descriptive user agent) before our next load testing event

We've also been tweaking and tuning our ratelimits in general to try to find a happy medium. Both of the API endpoints should now be limiting at the same rate of 1000 reqs per 10s per client IP (as a burstable token bucket filter).

Actually, you missed it. In truth, after looking at the APIs that you had referenced, a bunch of discussion, we backed down the majority of our load testing obligations from last week. We'll be implementing your suggestions (using the preferred API and providing a more descriptive user agent) before our next load testing event

Excellent - much appreciated!
Once you now your next date, if you want to open a new task to give us a heads up, that would be appreciated as well.

For now I will close this task then as I understand it has been already done.

Thanks again