Page MenuHomePhabricator

Throttle requests to PhotoDNA API to no more than 5 requests per second
Closed, DeclinedPublic9 Estimated Story Points

Description

PhotoDNA has a rate limit of 5 requests per second (docs). We need to update the ModerateExistingFiles.php maintenance script to ensure it does not exceed this limit.

The way we've discussed doing this is:

  • Place the existing job queue implementation behind a command line flag, like --use-job-queue, in case we need it again someday.
  • The new implementation should iterate over files synchronously rather than dispatching them to the job queue for asynchronous processing
  • The new implementation could use Guzzle's ability to batch multiple HTTP requests (documentation) if we can do so with MediaWiki's HttpRequestFactory. If we can't use HttpRequestFactory for this, then let's just send one item at a time. Either way, we need some timers to track that we don't send more than 5 items per second.

    T341221 and T336576 have been folded into this ticket

Event Timeline

kostajh renamed this task from Throttle PhotoDNA API to Throttle requests to PhotoDNA API to no more than 5 requests per second.Jun 26 2023, 12:21 PM
kostajh updated the task description. (Show Details)

this ticket should cover the following items

  • Verify image size, if greater than 4 MB - need to verify has been done
  • Verify image height x width - not being done now, just makes everything a set size thumbnail, regardless of size
  • Update code to use a new system of submitting code for batches - in retrospect, may try guzzle to submit batch of 5, but think right now if we can just get moving with one at a time, that would be good
  • Sumit files using guzzle to be sure that the timing is correct Up to 10 million transactions per month (5 requests per second) (governor)
  • Attempt to tie into the current code base for evaluation and tracking
  • A table for tracking has been suggested and may be a better solution than the one currently available
  • If unable to tie in, there will be more steps/ticket
    • Steps Could Be
      • Tracking of files
      • Update database with current information
      • Make a way to notify the correct folks
  • Verify email works when there is an image found that is in violation
  • Need to update tests on each section
JKieserman set the point value for this task to 3.Jul 5 2023, 1:16 PM

ticket?
Verify image size, if greater than 4 MB - need to verify has been done
Verify image height x width - not being done now, just makes everything a set size thumbnail, regardless of size
ticket?
Update code to use a new system of submitting code for batches - in retrospect, may try guzzle to submit batch of 5, but think right now if we can just get moving with one at a time, that would be good
Sumit files using guzzle to be sure that the timing is correct Up to 10 million transactions per month (5 requests per second) (governor)
ticket?
Attempt to tie into the current code base for evaluation and tracking
should be new ticket?
Verify email works when there is an image found that is in violation
should be new ticket?
Need to update tests on each section

not applicable to ticket
A table for tracking has been suggested and may be a better solution than the one currently available
If unable to tie in, there will be more steps/ticket
Steps Could Be
Tracking of files
Update database with current information
Make a way to notify the correct folks

file; includes/http/GuzzleHttpRequest.php -> this is the beginning of the API request development process.

method: execute() ~line 120

Guzzle Options output

Array
(
    [http_errors] =>
    [timeout] => 25
    [connect_timeout] => 5
    [version] => 1.1
    [allow_redirects] =>
    [body] => {"DataRepresentation":"URL","Value":"/w/images/thumb/6/69/Img_65.jpg/160px-Img_65.jpg"}
    [expect] =>
)

POST DATA
{"DataRepresentation":"URL","Value":"/w/images/thumb/6/69/Img_65.jpg/160px-Img_65.jpg"}
string(4) "POST"

$postData = null: null
Declared in:
MWHttpRequest
Source:
.../includes/http/MWHttpRequest.php

Issue which may necessitate pausing this ticket (T339988):
  1. DataRepresentation value does not contain a full URL
  2. ImageSize does not maintain consistent ratios when resized

Both of these will lead to an unsuccessful outcome.

extensions/MediaModeration/src/MediaModerationHandler.php
method handleMedia ~81 is where image resizing happens
$stack = HandlerStack::create( $this->handler ); $this->handler is null

extensions/MediaModeration/src/MediaModerationHandler.php
line ~93
$thumbUrl = $this->thumbnailProvider->getThumbnailUrl( $file );

extensions/MediaModeration/src/ProcessModerationCheckResult.php
function processResult(CheckResultValue $result, File $file) is where results are checked and EMAIL is sent, called from

Have emailed PhotoDNA requesting clarification of limits on image size. What I had from the website is no longer there, so am keeping my fingers crossed that they have dropped those requirements.

Response 200
Name Description
Status Status codes and corresponding descriptions:
3000: OK
3002: Invalid or missing request parameter(s)
3004: Unknown scenario or unhandled error occurred while processing request
3206: The given file could not be verified as an image
3208: Image size in pixels is not within allowed range (minimum size is 160x160 pixels; maximum size is 4MB)
TrackingId UniqueID that identifies this individual request.
IsMatch Boolean value indicating whether the submitted image matched a known image

MatchDetails
Collection of MatchFlags.

MatchFlag: Specifies the source of known image which the submitted image matched.

Note: Submitted images may match images from multiple sources.

EvaluateResponse
Collection of image evaluation flags:

AdultClassificationScore: Numeric score representing the likelihood of adult content
IsImageAdultClassified: Boolean representing whether or not adult content was found
RacyClassificationScore: Numeric score representing the likelihood of racy content
IsImageRacyClassified: Boolean representing whether or not racy content was found
AdvancedInfo: reserved for future use
Result: Boolean representing whether or not adult and/or racy content was found
Note: this object is null unless the header 'Enable-Evaluation' is present and a valid Content Moderator key has been provided in PhototDNA portal.

Change 945941 had a related patch set uploaded (by EllenR; author: EllenR):

[mediawiki/extensions/MediaModeration@master] Throttle requests to PhotoDNA API to 5 per second

https://gerrit.wikimedia.org/r/945941

Change 945942 had a related patch set uploaded (by EllenR; author: EllenR):

[mediawiki/extensions/MediaModeration@master] Modify script to load max 5 x 1 sec to PhotoDNA

https://gerrit.wikimedia.org/r/945942

ERayfield changed the point value for this task from 3 to 9.

Change 949093 had a related patch set uploaded (by Jsn.sherman; author: Jsn.sherman):

[mediawiki/extensions/MediaModeration@master] [DNM] gerrit dependent patch example - child

https://gerrit.wikimedia.org/r/949093

Just noticed on wiki/Special:Upload that we allow png, fig, jpg, jpeg and webp - However, PhotoDNA does not check webp files. Also, we list the Max file size to upload as Maximum file size: 100 MB, but photoDNA will require that size to be shrunk

image.png (734×1 px, 45 KB)

ON HOLD - (unless someone wants to pick it up) UNTIL 28 AUG 2023
due to a lack of understanding of the PHP unit tests, this ticket is on hold until I return from PTO unless someone wants to play with the code - please drop a line to me if you decide to do that.

Change 949093 abandoned by Jsn.sherman:

[mediawiki/extensions/MediaModeration@master] [DNM] gerrit dependent patch example - child

Reason:

https://gerrit.wikimedia.org/r/949093

kostajh added a subscriber: ERayfield.

Change 965925 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/MediaModeration@master] ModerateExistingFilesHelper: Don't use job queue by default

https://gerrit.wikimedia.org/r/965925

Change 965926 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/MediaModeration@master] ModerateExistingFilesHelper: Avoid making more than 5 requests per second

https://gerrit.wikimedia.org/r/965926

Change 945942 abandoned by Kosta Harlan:

[mediawiki/extensions/MediaModeration@master] WIP Throttle requests to PhotoDNA API to 5 per sec function requestModeration in RequestModerationCheck had 5 return points, reduced to one return to cut down on complexity

Reason:

https://gerrit.wikimedia.org/r/945942

Change 945941 abandoned by Kosta Harlan:

[mediawiki/extensions/MediaModeration@master] Throttle requests to PhotoDNA API to 5 per second

Reason:

https://gerrit.wikimedia.org/r/945941

Change 965925 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@master] ModerateExistingFilesHelper: Don't use job queue by default

https://gerrit.wikimedia.org/r/965925

Change 965926 abandoned by Kosta Harlan:

[mediawiki/extensions/MediaModeration@master] ModerateExistingFilesHelper: Avoid making more than 5 requests per second

Reason:

See https://phabricator.wikimedia.org/project/view/6841/

https://gerrit.wikimedia.org/r/965926