Page MenuHomePhabricator

Denial of Service due to repeated hits from a particular IP
Closed, DeclinedPublic

Description

We are trying to figure out what some of the issues are with PhotoDNA and Media Moderation issues.
from https://phabricator.wikimedia.org/T287511

For #2, System receiving 403 for some images while we are able to access those URLs from postman, one possibility is that their server has some DoS protection and they have seen thousands or millions of requests from Content Moderator’s IP address so their system thinks we are an attacker.

Wikimedia team – can you please check to see if you are somehow rejecting Content Moderator’s requests because your system is seeing too many requests coming from the same IP address?

May have tagged the wrong group, if so please inform me of the best group to use (thanks in advance) - erayfield

Event Timeline

RLazarus subscribed.

Routing to Traffic to see if this is a VCL rule we're hitting.

@ERayfield Can you provide some example requests, with headers and source IP?

I'm going to preemptively make this an NDA-protected task, given it's about to have IP addresses in it.

RLazarus changed the visibility from "Public (No Login Required)" to "Custom Policy".
RLazarus changed the edit policy from "All Users" to "Custom Policy".

@RLazarus not really, at this point we are just trying to figure out what the issues could be with Media Moderation tool/script. We have been in contact with PhotoDNA to try to understand some of the errors received when running script. We were asked (shown in original request)

Wikimedia team – can you please check to see if you are somehow rejecting Content Moderator’s requests because your system is seeing too many requests coming from the same IP address?

So that is what I am trying to do, and hope you can help me with this.

@ERayfield - We have a bunch of different pieces of code that can reject requests with 403s or 429s for various protective reasons, but it's hard for us to make any progress on identifying which such rule might have been tripped in this case without some identifier to look for, like the IP address or a unique user-agent string, etc (something that would tell us which of the ~127K reqs/sec flowing through our infra are the ones in question).

Removing comment as @BBlack is more authoritative and basically said the same.

so sorry for delay, I hope the following will help somewhat, here is additional information from forwarded email

PhotoDNAQuestions

I have been trying to condense the shared file linked above but am not sure I have done a good enough job. I would encourage you to follow the PhotoDNAQuestions link to see the email progression

With the subjected IcM, Wikimedia reported they are seeing too many errors with the below exception.

Exception : “The given file could not be verified as an image”
Looking at the backend logs, we could see two main issues that are causing these errors.

404 forbidden error while downloading the images.

Incomplete or partial URL being sent via the request.body.

Wrong response body : {"DataRepresentation":"URL","Value":"/w/resources/assets/file-type-icons/fileicon-ogg.png"}

Correct response body : {"DataRepresentation":"URL","Value":https://upload.wikimedia.org/wikipedia/commons/thumb/6/62/2021_01_20_comicio_braga_%28176%29_%28Esquerda.Net_50869033693%29.jpg/160px-2021_01_20_comicio_braga_%28176%29_%28Esquerda.Net_50869033693%29.jpg}

Due to incomplete URL, we are unable to download the image.
Looking at the logs, we are getting the Wrong response body multiple times


AND

Hi Wikimedia team,

Below is an update from our engineering team, who have investigated the errors cited in this thread:
Regarding the subjected issue for Wikimedia, we see two major concerns for the error “3206 : he given file could not be verified as an image ”
Could not find a part of the path i.e we are getting incomplete URL, hence we are unable to download them

This is excepted as we are not getting the complete URL from the request.

The remote server returned an error: (403) Forbidden, looks like there is an issue while we are downloading the images.
LoadBytes error System.Net.WebException: The remote server returned an error: (403) Forbidden.

at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)

For #2, System receiving 403 for some images while we are able to access those URLs from postman, one possibility is that their server has some DoS protection and they have seen thousands or millions of requests from Content Moderator’s IP address so their system thinks we are an attacker.
Wikimedia team – can you please check to see if you are somehow rejecting Content Moderator’s requests because your system is seeing too many requests coming from the same IP address?

Thank you,
PhotoDNA Support

fgiunchedi triaged this task as Medium priority.Apr 26 2022, 9:15 AM

file-type-icons/fileicon-ogg.png

That's the (relative path) icon for an ogg audio file, which I'm trusting is not something that they handle. Possibly audio samples should be filtered out before submission ? Or at least make this absolute?

Can you find out about this, please:
Wikimedia team – can you please check to see if you are somehow rejecting Content Moderator’s requests because your system is seeing too many requests coming from the same IP address?

Thank you,
PhotoDNA Support

BCornwall changed the task status from Open to Stalled.EditedMar 15 2023, 10:27 PM
BCornwall subscribed.

Is this still an issue? If so, we still need more identifying information, such as IP addresses or unique user agents.

There are some links listed above - they may be hidden due to the age of the request.

Closing this as I'm still not seeing any provided indentifying information that we could utilize for debugging the issue. If I'm mistaken, or if that information is provided, please do re-open!

Thanks!

Closing this as I'm still not seeing any provided indentifying information that we could utilize for debugging the issue. If I'm mistaken, or if that information is provided, please do re-open!

I see @RLazarus tagged this as PermanentlyPrivate. Glancing at the task, I'm not seeing anything explicit here where it would need to be marked as such now? Or are there some sensitive portions here we'd rather not disclose?

No, I tagged it private when we asked for PII, so that it would already be private when that stuff was posted. Since it never appeared, I'm fine with opening it up.

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".
sbassett changed the edit policy from "Custom Policy" to "All Users".
sbassett moved this task from Frozen to Our Part Is Done on the Security-Team board.
sbassett removed a project: WMF-NDA.