Page MenuHomePhabricator

Flickr blocking image requests from Toolforge k8s, breaking multiple tools
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • webservice shell
  • curl -v -o flickr.jpg "https://live.staticflickr.com/65535/54265555549_3e41dca85a_o_d.jpg"

What happens?:

<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>

originally, now

<html><body><h1>429 Too Many Requests</h1>
You have sent too many requests in a given amount of time.
</body></html>
tools.anticompositetest@shell-1737548902:~$ curl -v -o flickr.jpg "https://live.staticflickr.com/65535/54265555549_3e41dca85a_o_d.jpg" 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 3.171.59.65:443...
* Connected to live.staticflickr.com (3.171.59.65) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [3851 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [264 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [36 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [36 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=static.flickr.com
*  start date: Oct 12 00:00:00 2024 GMT
*  expire date: Nov  9 23:59:59 2025 GMT
*  subjectAltName: host "live.staticflickr.com" matched cert's "*.staticflickr.com"
*  issuer: C=US; O=Amazon; CN=Amazon RSA 2048 M02
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x557247e2d200)
} [5 bytes data]
> GET /65535/54265555549_3e41dca85a_o_d.jpg HTTP/2
> Host: live.staticflickr.com
> user-agent: curl/7.74.0
> accept: */*
> 
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [157 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
} [5 bytes data]
< HTTP/2 429 
< content-type: text/html
< content-length: 117
< date: Wed, 22 Jan 2025 12:29:10 GMT
< cache-control: no-cache
< x-cache: Error from cloudfront
< via: 1.1 308930dd559485ab2bf680b9ef6cf01c.cloudfront.net (CloudFront)
< x-amz-cf-pop: IAD61-P8
< x-amz-cf-id: Rm5gpz5_InSWGvf2CHJSXxKD8mWNgZuhYVIkkPBKKEiAam7Sf79qsA==
< 
{ [117 bytes data]
100   117  100   117    0     0   3342      0 --:--:-- --:--:-- --:--:--  3342
* Connection #0 to host live.staticflickr.com left intact

What should have happened instead?:
200 OK

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):
This is affecting flickr2commons (though it appears to be intermittent, with some files occasionally getting through), @Don-vip's https://commons.wikimedia.org/wiki/User:OptimusPrimeBot, and https://commons.wikimedia.org/wiki/User:FlickreviewR_2 . Other unidentified tools are also likely affected. Previous discussion has been on https://commons.wikimedia.org/wiki/User_talk:Zhuyifei1999#FlickreviewR_2_getting_403'd_by_Flickr where @Sannita was going to contact the person who was keeping in contact with the Flickypedia (Flickr Foundation) folks.

It seems like the primary contact method for Flickr.com is through https://www.flickrhelp.com/hc/en-us/requests/new. The Flickr Foundation, which does not run the site but might be able to advocate on our behalf, is at hello@flickr.org. I have not attempted either and feel that outreach from WMCS staff would be more useful since it's affecting multiple tools on the shared hosting.

Event Timeline

Andrew triaged this task as High priority.

Using the help request form:

Hello!

My name is Andrew Bogott, and I'm an SRE at the wikimedia foundation. I'm one of the maintainers of Wikimedia's private cloud and the associated Toolforge platform -- these run many of the tools and bots that maintain Wikipedia content.

I've recently received reports from my users that flickr is blocking requests from within our network; this is breaking several important tools and workflows.

Traffic from our cloud will appear to originate from 185.15.56.1. I'd love it if you could just lift throttles for requests from that IP but failing that, please advise how to get things up and running again.

This issue is tracked in our public ticket system here: https://phabricator.wikimedia.org/T384468

Thank you!

-Andrew

It appears that as of 13:44 UTC, FlickreviewR 2 is now running normally, and the pending files have been promptly reviewed. Perhaps it's time to close the task?

Can we please wait a few days? FlickreviewR 2 was not the only one impacted, I completely stopped Flickr activities on my bot to stop overloading the review category. I just have resumed my uploads, I would like to wait at least 24 hours to see if I face errors.

OK, it's working fine for my tool as well.

Got a reponse from flikr support today:

Hi 👋,

Thank you for reaching out. My name is Tara - happy to help you today.
 
I reached out to our engineers and I will get back to you with an update soon.
 
Warm regards,

Tara

Flickr Support
Hi,
 
Our engineers confirmed that this should be resolved. Can you let me know how it is looking on your end?
 
Warm regards,

Tara

Flickr Support

@AntiCompositeNumber shall I reply to Tara that things are resolved on our end? Or do you have remaining concerns?

Hello,
The problem occurs again for my tool:

don-vip@worker-2:~/spacemedia$ hostname -f
worker-2.spacemedia.eqiad1.wikimedia.cloud
2025-02-02T17:51:33.542Z  INFO 717395 --- [media-update-usgovernment.flickr] o.w.c.d.spacemedia.utils.MediaUtils      : Reading file https://live.staticflickr.com/65535/53728392139_f9b70e4ebc_o.jpg
2025-02-02T17:52:03.743Z  WARN 717395 --- [media-update-usgovernment.flickr] o.w.c.d.spacemedia.utils.MediaUtils      : HEAD /65535/53728392139_f9b70e4ebc_o.jpg => Too Many Requests -- [Content-Type:"text/html", Content-Length:"117", Connection:"keep-alive", Date:"Sun, 02 Feb 2025 17:52:03 GMT", Cache-Control:"no-cache", X-Cache:"Error from cloudfront", Via:"1.1 eacb5a468c39b7a6f5ea6363966ed0bc.cloudfront.net (CloudFront)", X-Amz-Cf-Pop:"IAD61-P8", X-Amz-Cf-Id:"BpndH2u-6WL9xxWataQj_qTasnQrNDxquDJt2_GRgciNGuek9z1lrw=="]

FlickreviewR 2 is still working fine though.

Is there a specific user-agent to use? I'm not using a particular one.

You should be using a User-Agent that identifies your particular bot.

I would very much like to follow up with flickr support about this. Can I please get some feedback?

I would very much like to follow up with flickr support about this. Can I please get some feedback?

On my side I still face errors:

2025-02-18T20:28:14.842Z  INFO 1664367 --- [media-update-usdoi.flickr] o.w.c.d.spacemedia.utils.MediaUtils      : Reading file from https://live.staticflickr.com/759/32139984172_d4d88e1ee6_o.jpg
2025-02-18T20:28:45.080Z  WARN 1664367 --- [media-update-usdoi.flickr] o.w.c.d.spacemedia.utils.MediaUtils      : HEAD /759/32139984172_d4d88e1ee6_o.jpg => Too Many Requests -- [Content-Type:"text/html", Content-Length:"117", Connection:"keep-alive", Date:"Tue, 18 Feb 2025 20:28:45 GMT", Cache-Control:"no-cache", X-Cache:"Error from cloudfront", Via:"1.1 9b283d80d8ea57cdfccedd6e3b45608c.cloudfront.net (CloudFront)", X-Amz-Cf-Pop:"IAD61-P8", X-Amz-Cf-Id:"orploDHUE7LFJx-GV1JQeXQcoV3EBeN1lkTJkdxOvzpTUXb60ftvhg=="]
2025-02-18T20:29:15.306Z  WARN 1664367 --- [media-update-usdoi.flickr] o.w.c.d.spacemedia.service.MediaService  : Ignored file reading error of https://live.staticflickr.com/759/32139984172_d4d88e1ee6_o.jpg => GET /759/32139984172_d4d88e1ee6_o.jpg => Too Many Requests -- [Content-Type:"text/html", Content-Length:"117", Connection:"keep-alive", Date:"Tue, 18 Feb 2025 20:29:15 GMT", Cache-Control:"no-cache", X-Cache:"Error from cloudfront", Via:"1.1 9b283d80d8ea57cdfccedd6e3b45608c.cloudfront.net (CloudFront)", X-Amz-Cf-Pop:"IAD61-P8", X-Amz-Cf-Id:"gOs8SwFrEj1geLL2l1hq4l27ZVbGxgyvgFUbuYQfeG1zxYEKVDsGBQ=="] -- <html><body><h1>429 Too Many Requests</h1>

Thank you @Don-vip, I've relayed that back to to flickr support.

Sent:

Reports from my users are various. One, at least, is still getting throttled, see log below.

Can you tell me how your rate-limit is implemented? I believe the tool in question is already using a distinctive user agent string but he can surely change it as needed.

2025-02-18T20:28:14.842Z INFO 1664367 --- [media-update-usdoi.flickr] o.w.c.d.spacemedia.utils.MediaUtils : Reading file from https://live.staticflickr.com/759/32139984172_d4d88e1ee6_o.jpg
2025-02-18T20:28:45.080Z WARN 1664367 --- [media-update-usdoi.flickr] o.w.c.d.spacemedia.utils.MediaUtils : HEAD /759/32139984172_d4d88e1ee6_o.jpg => Too Many Requests -- [Content-Type:"text/html", Content-Length:"117", Connection:"keep-alive", Date:"Tue, 18 Feb 2025 20:28:45 GMT", Cache-Control:"no-cache", X-Cache:"Error from cloudfront", Via:"1.1 9b283d80d8ea57cdfccedd6e3b45608c.cloudfront.net (CloudFront)", X-Amz-Cf-Pop:"IAD61-P8", X-Amz-Cf-Id:"orploDHUE7LFJx-GV1JQeXQcoV3EBeN1lkTJkdxOvzpTUXb60ftvhg=="]
2025-02-18T20:29:15.306Z WARN 1664367 --- [media-update-usdoi.flickr] o.w.c.d.spacemedia.service.MediaService : Ignored file reading error of https://live.staticflickr.com/759/32139984172_d4d88e1ee6_o.jpg => GET /759/32139984172_d4d88e1ee6_o.jpg => Too Many Requests -- [Content-Type:"text/html", Content-Length:"117", Connection:"keep-alive", Date:"Tue, 18 Feb 2025 20:29:15 GMT", Cache-Control:"no-cache", X-Cache:"Error from cloudfront", Via:"1.1 9b283d80d8ea57cdfccedd6e3b45608c.cloudfront.net (CloudFront)", X-Amz-Cf-Pop:"IAD61-P8", X-Amz-Cf-Id:"gOs8SwFrEj1geLL2l1hq4l27ZVbGxgyvgFUbuYQfeG1zxYEKVDsGBQ=="] -- <html><body><h1>429 Too Many Requests</h1>

Response was:

Hi there,

I apologize: this is information I am unable to provide. You may want to reach out to the developer of the app to make sure this is working correctly on their side.

Warm regards,
Doug
Flickr Support

I'm not sure what that last response means but this feels like a dead end to me. I'm open to suggestions!

Is there a specific user-agent to use? I'm not using a particular one.

Anything that is not the default i would say. Also id seriously consider switching to some sort of flow where traffic is authenticated. There is just so much anonymous bad actor traffic that services have to deal with in the modern age, that anonymous services are very likely to run into rate limiting.

Andrew changed the task status from Open to Stalled.Apr 10 2025, 7:40 PM
Andrew lowered the priority of this task from High to Low.

Flickr apparently now blocks all requests with a 403 error (API calls, web page retrievals, everything). Can you please check with Flickr if Wikimedia IP address has been blocked?

Hello again!

Our tools are now getting a 403 from flicker for every request, which suggests that you are actively blocking access. Likely you have added new protection measures in response to the recent flood of AI scrapers (which are slamming us as well.)

Can you doublecheck that access is open for the wikimedia cloud? Traffic is likely to originate from 185.15.56.1 but we can also add a custom user agent if that helps.

Thank you!

-Andrew

It works again! Thank you so much Andrew!

Hi Andrew,
 
Becki from Flickr Support here again.
 
Our engineers have made some adjustments on our end - can you let me know if things are working for you again?
 
If not, let me know and I'll follow up with our engineers for further investigation.
 
I'll be standing by.
Becki

I responded and said things were fixed. So we are good for now, until they mess with us again!