Essex and I tried to run the MediaModeration script today, but as we looked in logstash, there were a ton of errors. Petr Pchelko confirmed it seemed like a high error rate so we paused to do more investigation.
To do this, we need to roll back the change made here https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/606239/9/wmf-config/InitialiseSettings.php#b6736 and set it to 'warning' so that we have more robust logs.
###Findings from looking through the source###
**MediaModeration is not currently checking file type for thumbnail**
- we should check for file types in
- https://gerrit.wikimedia.org/g/mediawiki/extensions/MediaModeration/+/64eee7e258a8d6bf5356ed81c01e5c23db1d34a4/src/ThumbnailProvider.php
- Content-Type: image/gif
- Content-Type: image/jpeg
- Content-Type: image/png
- Content-Type: image/bmp
- Content-Type: image/tiff
- PhotoDNA lists accepted file types here https://developer.microsoftmoderator.com/docs/services/57c7426e2703740ec4c9f4c3/operations/57c7426f27037407c8cc69e6/console?ref=mktg
**We don't have visibility into load balancer/cdn behavior when photodna requests an image**
- We could send photodna file content instead of URLs
- this would eliminate network troubleshooting and allow for local testing
**We are not sure that the script actually identifies problematic images**
- we could upload the test files to local environment for testing
- https://pdnasampleimages.blob.core.windows.net/matchedimages/SampleImages.zip
- we could add those files to the test suite if licensing allows
**additional info**
It looks like the original implementation sent file content instead of urls
```
commit e5c6ee716b0230de5a0deec8f4a344cd02ffdc90
Author: Peter Ovchyn <peter.ovchyn@speedandfunction.com>
Date: Tue Mar 3 15:45:27 2020 +0200
Implement PhotoDNA integration using MWHttpRequest
Bug: T246206
Change-Id: I5a202c949436b9962e48dd52833aa12e37d129fa
```
but then it changed to sending urls as part of the thumbnail implementation:
```
commit 9028494fa014a423319090091592ee67994b1b44
Author: Peter Ovchyn <peter.ovchyn@speedandfunction.com>
Date: Thu Mar 12 22:17:59 2020 +0200
Send 160x160 thumbnails to photo DNA instead of real files
Bug: T246915
Change-Id: I6424f256bb4ba1cba6b115390b9f6e34f728cc5c
```
In T308451, I found that only ,05% of images are making it to PhotoDNA. We still don't know the exact reason for these errors though.