Page MenuHomePhabricator

Debug Error: The given file could not be verified as an image
Closed, ResolvedPublic

Description

  • There has been a check for the file type before sending to PhotoDNA, this should take care of the 'not verified as image' section.
Error response from PhotoDNA service:  {
  "Status": {
    "Code": 3206,
    "Description": "The given file could not be verified as an image",
    "Exception": null
  },
  "ContentId": null,
  "IsMatch": false,
  "MatchDetails": null,
  "XPartnerCustomerId": null,
  "TrackingId": "WUS_8832767f55c743fbb0b73619d78ed0f1_57c7457ae3a97812ecf8bde9_b63aacd35bf24e08a8856a07269ab496",
  "EvaluateResponse": null
}
  • build response/reply with PhotoDNA - probably guzzle or curl
  • build a moderator which will limit sending requests to 5 per second
NOTE: At this point we will be sending images, so we will not implement hashing in this ticket

Event Timeline

A good next step would be to identify a few examples of files that the script sent to the service, to verify that they are indeed image files. It would also be good to document how many of these errors we've seen (out of total items processed) in the most recent script run.

ERayfield changed the task status from Open to In Progress.May 18 2023, 3:16 PM
ERayfield claimed this task.

probably do check around

/var/www/html/w/extensions/MediaModeration/src/MediaModerationHandler.php 92

check for media type is done, but we are adding our own query to PhotoDNA so that we are able to control the timing of submissions

build response/reply with PhotoDNA - probably guzzle or curl

I think we should be able to use the existing code in RequestModerationCheck::createModerationRequest(), which uses the HttpRequestFactory service from MediaWiki's services.

For batching the requests in groups of 5, I took a look to see if anyone's using Guzzle's sendAsync or requestAsync methods, but I don't see it. For that reason, I think we could start with sending requests serially to PhotoDNA, then in a follow-up we could see about implementing parallel requests in batches of up to 5 per second. How does that sound?

here is what i was thinking -
use RequestModerationCheck::createModerationRequest(), but changing the $options -
$options = [

			'method' => 'POST',
			'postData' => Utils::jsonEncode( [
				'DataRepresentation' => 'URL',      <----- url
				'Value' => $url                                <------ image as string
			] ),
		];

not sure about HttpRequestFactory, will have to spend some time there to see if it is useful

However, since the code already has a loop for checking for the correct image type, figured after checking and getting the thumbnail, just sending the thumbnail then with response request being delt with. Not sure guzzle is the way to go, but that was recommend rather than installing http2

here is what i was thinking -
use RequestModerationCheck::createModerationRequest(), but changing the $options -
$options = [

			'method' => 'POST',
			'postData' => Utils::jsonEncode( [
				'DataRepresentation' => 'URL',      <----- should be image
				'Value' => $url                                <------ pay load
			] ),
		];

not sure about HttpRequestFactory, will have to spend some time there to see if it is useful

Yes, changing $options as you've indicated sounds like the right way.

That code is already using HttpRequestFactory:

$annotationRequest = $this->httpRequestFactory->create(
	$this->photoDNAUrl,
	$options
);

However, since the code already has a loop for checking for the correct image type, figured after checking and getting the thumbnail, just sending the thumbnail then with response request being delt with. Not sure guzzle is the way to go, but that was recommend rather than installing http2

I'm not sure, but I imagine we are most likely to get accurate results if we send the original image, and not any resized version of it. I don't see a clear answer about that in the PhotoDNA docs. We're somewhat limited in sending an original image, though, in that PhotoDNA accepts a maximum size of 4MB. I guess in that case we could use a resized version of the image.

Change 923692 had a related patch set uploaded (by EllenR; author: EllenR):

[mediawiki/extensions/MediaModeration@master] WIP file type working on submitting to PhotoDNA

https://gerrit.wikimedia.org/r/923692

here is what i was thinking -
use RequestModerationCheck::createModerationRequest(), but changing the $options -
$options = [

			'method' => 'POST',
			'postData' => Utils::jsonEncode( [
				'DataRepresentation' => 'URL',      <----- should be image
				'Value' => $url                                <------ pay load
			] ),
		];

not sure about HttpRequestFactory, will have to spend some time there to see if it is useful

Yes, changing $options as you've indicated sounds like the right way.

That code is already using HttpRequestFactory:

$annotationRequest = $this->httpRequestFactory->create(
	$this->photoDNAUrl,
	$options
);

However, since the code already has a loop for checking for the correct image type, figured after checking and getting the thumbnail, just sending the thumbnail then with response request being delt with. Not sure guzzle is the way to go, but that was recommend rather than installing http2

I'm not sure, but I imagine we are most likely to get accurate results if we send the original image, and not any resized version of it. I don't see a clear answer about that in the PhotoDNA docs. We're somewhat limited in sending an original image, though, in that PhotoDNA accepts a maximum size of 4MB. I guess in that case we could use a resized version of the image.

Digging through Phabricator history, I see T246915#6297470, in particular

I think the bottom line though is that I think we absolutely should not do ad-hoc image processing on app servers and jobrunners (we moved Thumbor out for a reason)

IIUC, though, what we are proposing to do is not ad-hoc image processing that @Krinkle expressed concern about. What we're proposing to do is:

  1. Fetch the existing thumbnail contents for an image, to stay under the 4 MB limit for PhotoDNA
  2. Send the thumbnail contents to PhotoDNA

Change 923692 abandoned by Jkieserman:

[mediawiki/extensions/MediaModeration@master] Allows single file to be verified via PhotoDNA

Reason:

closing in favor of newer patch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/930163

https://gerrit.wikimedia.org/r/923692

Change 933132 had a related patch set uploaded (by EllenR; author: Daimona Eaytoy):

[mediawiki/extensions/MediaModeration@master] Enable one image to be tested against PhotoDNA

https://gerrit.wikimedia.org/r/933132

updated query to include just the file extensions PhotoDNA is interested in, earlier versions were pulling anything that was listed as "img_media_type = string(6) "BITMAP", but BITMAP is not granular enough,

We have debugged this; I propose we close in favor of the actionable task, which is T336576: Modify script to take image from database and upload images directly to PhotoDNA. cc @Madalina @JKieserman

Hi, just wanted to bump this comment for feedback. AFAICT the task that specifies something to build to fix the problem is T336576: Modify script to take image from database and upload images directly to PhotoDNA.

I agree, we should close this.

Change 933132 abandoned by Kosta Harlan:

[mediawiki/extensions/MediaModeration@master] File could not be verified as an image

Reason:

https://gerrit.wikimedia.org/r/933132