Page MenuHomePhabricator

Json queries fail "Too Many Requests"
Closed, InvalidPublic

Description

In general, "Too Many Requests" isn't a valid error. If the servers are too busy they should slow down individual queries to let every user get their data.
It isn't clear what does "too many" mean. Too many from one IP address? Too many worldwide? Too many on one connection? To many per second?

Related Objects

Event Timeline

Yurivict created this task.Jun 16 2017, 1:02 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 16 2017, 1:02 AM
Reedy added a subscriber: Reedy.

What are you querying? The MW API? Are you making a lot of requests simultaneously?

I use MW API.
I run requests from several threads. How do you define "a lot"?

Reedy added a subscriber: BBlack.Jun 16 2017, 3:27 AM

For the API you're using, we currently have a per-client-IP ratelimiter in place that will limit at 600 reqs/min/clientip, which is designed to help keep bulk API users from sucking up large fractions of the resources we intend for direct user agents. The guestimated rate above is about 4x the limit, so unless you're sourcing this from 5 or more distinct IPs on your end, you'll probably run into 429 error codes from the ratelimiter.

That's one answer as to why you might be getting 429 error codes...

How many are you making? From where?

Marostegui triaged this task as Medium priority.Jun 16 2017, 4:59 AM

@Yurivict: Please provide clear steps and information to reproduce the problem. See https://mediawiki.org/wiki/How_to_report_a_bug for more information. Thanks!

(General note: If this task turns out that there is something to document here, https://www.mediawiki.org/wiki/API:Etiquette might be the place.)

The attached python program runs many requests in many connections concurrently. It saves all responses into the file result.txt
After about 30 seconds this file gets the lines like this:

Request from NNN.NNN.NNN.NNN via cp4017 frontend, Varnish XID 449779982<br>Upstream caches: cp4017 int<br>Error: 429, Too Many Requests at Sat, 17 Jun 2017 16:40:33 GMT</code></p></div></html>

Reedy added a comment.Jun 17 2017, 6:08 PM

Yeah, then almost certainly you're making too many requests in a minute period

Failing requests isn't a valid response. If I would run this from some company behind the firewall, nobody else will be able to use wikipedia from that IP address because it is flagged as issuing too many requests? This doesn't make sense.

You should introduce delays instead of failing requests.

Reedy added a comment.Jun 17 2017, 6:48 PM

Failing requests isn't a valid response. If I would run this from some company behind the firewall, nobody else will be able to use wikipedia from that IP address because it is flagged as issuing too many requests? This doesn't make sense.

You should introduce delays instead of failing requests.

Be glad you're not being blocked. https://www.mediawiki.org/wiki/API:Etiquette#Request_limit

Reedy added a comment.Jun 17 2017, 7:06 PM

https://tools.ietf.org/html/rfc6585

4.  429 Too Many Requests

   The 429 status code indicates that the user has sent too many
   requests in a given amount of time ("rate limiting").

   The response representations SHOULD include details explaining the
   condition, and MAY include a Retry-After header indicating how long
   to wait before making a new request.

   For example:

   HTTP/1.1 429 Too Many Requests
   Content-Type: text/html
   Retry-After: 3600

   <html>
      <head>
         <title>Too Many Requests</title>
      </head>
      <body>
         <h1>Too Many Requests</h1>
         <p>I only allow 50 requests per hour to this Web site per
            logged in user.  Try again soon.</p>
      </body>
   </html>

   Note that this specification does not define how the origin server
   identifies the user, nor how it counts requests.  For example, an
   origin server that is limiting request rates can do so based upon
   counts of requests on a per-resource basis, across the entire server,
   or even among a set of servers.  Likewise, it might identify the user
   by its authentication credentials, or a stateful cookie.

   Responses with the 429 status code MUST NOT be stored by a cache.

The spec seems to suggest failing requests is a valid response

Aklapper closed this task as Invalid.Jun 17 2017, 7:37 PM

Closing task as invalid.

The spec seems to suggest failing requests is a valid response

This is absolutely a valid general response code.
But I don't see how is it reasonable to fail requests when some metric is exceeded, vs. delaying responses.

I don't see why this is an invalid task.

But I don't see how is it reasonable to fail requests when some metric is exceeded, vs. delaying responses.

If we implemented the delay on the server side within the context of the HTTP request (and various enclosing connection contexts) it would tie up server-side state and computing resources over many such client connections. And if the client was ignorant of these delays and continued trying to spam many parallel requests at our servers, the combined impact would be to effectively create huge delayed-request queues on the server side. Because the server end has many clients and the client end is typically only talking to a smaller count of servers, the most efficient way to implement a delay is to inform the client to delay and retry for themselves. That is exactly what the response sent to you indicates with its 429 response code combined with the Retry-After: 1 response header.

Yurivict added a comment.EditedJun 19 2017, 3:26 PM

What you are referring to is HTTP pipelining. The server doesn't have to read pipelined headers once it determined that there are too many requests. The TCP decongestion procedure takes care of the rest.

Having said that, the problem is likely in varnish not supporting such behavior. So this isn't really a WikiMedia problem.