Page MenuHomePhabricator

Image Rate Limiting Issues For Future Audiences Project
Open, Needs TriagePublic

Description

We've ben hitting 429s when fetching images from our mobile app in development.

The app involves fetching a large number of apps to present to the user. We currently follow these rules

  • maximum of 5 requests per 1.2 second window
  • maximum of 5 concurrent requests
  • only fetching thumbs
  • pass a proper UA
  • backing off when given a 429
  • caching images client side

No matter what I've done the user ends up 429'd. This is very detrimental to the user experience.

I'm wondering if there is anything that can be done to improve the situation. Looking at https://wikitech.wikimedia.org/wiki/Robot_policy it seems I should reduce to 2 concurrent requests. I was told there were limits on a per image per second basis but that isn't documented there. Though I could reduce the thumb size further if that would help.

This ticket is an extension of the slack conversation at https://wikimedia.slack.com/archives/CTFK3B423/p1770756834733819

Event Timeline

What is the exact and full error message? What is the exact User Agent string?

here is the request in full

method: GET
uri: https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Emanu-elNYjeh.JPG/250px-Emanu-elNYjeh.JPG
compressionState: HttpClientResponseCompressionState.notCompressedisRedirect: false
persistentConnection: true
reasonPhrase: Your bot is making too many requests. Please reduce your request rate or contact bot-traffic@wikimedia.org (45c03d2)redirects: []
statusCode: 429
queryParameters: {}
access-control-allow-origin: [*]
x-analytics: []
strict-transport-security: [max-age=106384710; includeSubDomains; preload]
access-control-expose-headers: [Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache]
report-to: [{ "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }]server-timing: [cache;desc="int-front", host;desc="cp4050"]
content-type: [text/html; charset=utf-8]
server: [Varnish]
x-request-id: [3c6adeb3-f166-4050-aed2-3a7988c99d26]
timing-allow-origin: [*]
content-length: [2143]
x-client-ip: [174.62.80.234]
nel: [{ "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}]
retry-after: [11]
x-cache: [cp4050 int]
x-cache-status: [int-front]

User-Agent: Sunflower/1.0.0 (https://gitlab.wikimedia.org/repos/future-audiences/sunflower; future-audiences-private@wikimedia.org) Flutter
Host: upload.wikimedia.org
Accept-Encoding: gzip
Content-Length: 0

i contacted the email in question and then transitioned the conversation to slack and then was told to make this ticket.

given the recently announced increase in rate limiting i think this ticket is more urgent

The app involves fetching a large number of apps to present to the user.

I'd definitely like to unblock development, but, I don't want to set us up for future trouble here.

What kind of per-user rates of being able to fetch thumbnails are you hoping for?

The ratelimits you're encountering are certainly too strict for your use case, but, we need to make sure we aren't trying to build an app that will create media serving load that we'll never be able to support at scale.

thanks for the reply. i think first it would be helpful to know what the current limits are. i've been given conflicting information. not being able to enforce the limit means our users sometimes sit for 11 seconds with nothing happening.

Apologies for the conflicting information, that's partially my fault.

But all the limits we've been discussing so far have been the limits we apply to bot (non-human) traffic. There's no version of those limits that will yield an acceptable UX for this intended app, even if the app does respect them.

There are some temporary measures we can pursue to let you do higher rps from the app for now, we can pursue those probably next week (given the incident load today).

But for future planning it'd be a good idea to have an order of magnitude estimate of how many per-user fetches/minute this app "wants" to burst to. Does that make sense?

Hey Chris,

Perhaps we can talk live about this. I'm concerned about you mentioning that there will be no version of the limits that can facilitate an acceptable UX. I think we are hoping for 10 images/second if that's possible, but I don't want to make life difficult for all of you. I know you're dealing with a lot on the incident load front. We may end up just going live with the internal test as things stand, but would love to set up a time with you and my PM if that's possible.

Hey Chris,

Perhaps we can talk live about this. I'm concerned about you mentioning that there will be no version of the limits that can facilitate an acceptable UX. I think we are hoping for 10 images/second if that's possible, but I don't want to make life difficult for all of you. I know you're dealing with a lot on the incident load front. We may end up just going live with the internal test as things stand, but would love to set up a time with you and my PM if that's possible.

Sorry for being unclear -- there's no version of the bot ratelimits that could yield an acceptable UX for this. Which is somehow where we got stuck on the SRE side, instead of discussing the (still-in-flux) media limits we're tuning for what seem like real web browsers. (I'm not sure why we originally gave you the advice we give to bots; I guess we had totally misunderstood that this was a user-facing app, or something.)

For now, I've given Sunflower a much higher ratelimit. 10 rps ought to work fine for your initial trial.

Let's talk live soon as well, it would be especially valuable after we have some user experiences with the app.

Sorry for being unclear -- there's no version of the bot ratelimits that could yield an acceptable UX for this. Which is somehow where we got stuck on the SRE side, instead of discussing the (still-in-flux) media limits we're tuning for what seem like real web browsers. (I'm not sure why we originally gave you the advice we give to bots; I guess we had totally misunderstood that this was a user-facing app, or something.)

For now, I've given Sunflower a much higher ratelimit. 10 rps ought to work fine for your initial trial.

Let's talk live soon as well, it would be especially valuable after we have some user experiences with the app.

Thank you so much! I will schedule something for us once we have some results. We really appreciate your help with this!