Page MenuHomePhabricator

[ DEBUG ] fix or remove file hashing
Closed, InvalidPublic

Description

Right now MediaModeration appears to run into an error when hashing images. Those errors look like this:

2023-05-05 15:51:53.829913 [e0fd7bba5ba543b04bd7df62] mw1438 commonswiki 1.41.0-wmf.7 mediamoderation DEBUG: Hash check request failed for file Mikołaj_Łahodyński_(-1911).jpg.

This error is thrown here, which implies it is a problem with the extension's hashing (and not on the PhotoDNA side).

We need to do the following:

  • validate we are sending PhotoDNA a bad hash. We are receiving errors from 3206 errors from PhotoDNA ("Given file could not be verified as an image"), so we should confirm the hash is the culprit. One way to do this would be to disable hashing and re-run the script and see what happens.
  • investigate why the hashing is failing. Things to verify are (1) we are getting the thumbnail correctly and (2) the request moderation check is working
  • determine how to fix it. We should consider removing hashing all together if the problem will require a lot of time to fix, as the API does not require a hash

Event Timeline

JKieserman updated the task description. (Show Details)
JKieserman updated the task description. (Show Details)

FWIW, since T336205 https://phabricator.wikimedia.org/T336205 will not be using hashing, would that take care of this ticket?

Right now MediaModeration appears to run into an error when hashing images. Those errors look like this:

My understanding is that the MediaModeration extension does not do any hashing of images. It instead sends a URL to PhotoDNA, and the PhotoDNA services visits that URL. The PhotoDNA service then generates a hash of the image, which it compares to its internal database of hashes. The documentation for the endpoint also indicates that you either send the image content or a publicly accessible URL.

Hash check request failed for file Mikołaj_Łahodyński_(-1911).jpg.

I think this is just a not well-written error message. What the error message is saying is, "The HTTP request to the PhotoDNA service failed", not "Attempting to generate a hash of an image failed".

So overall, I think we can mark this task as Invalid.