Essex and I tried to run the MediaModeration script today, but as we looked in logstash, there were a ton of errors. Petr Pchelko confirmed it seemed like a high error rate so we paused to do more investigation.
To do this, we need to roll back the change made here https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/606239/9/wmf-config/InitialiseSettings.php#b6736 and set it to 'warning' so that we have more robust logs.
Findings from looking through the source
MediaModeration is not currently checking file type for thumbnail
- we should check for file types in
- https://gerrit.wikimedia.org/g/mediawiki/extensions/MediaModeration/+/64eee7e258a8d6bf5356ed81c01e5c23db1d34a4/src/ThumbnailProvider.php
- Content-Type: image/gif
- Content-Type: image/jpeg
- Content-Type: image/png
- Content-Type: image/bmp
- Content-Type: image/tiff
- https://gerrit.wikimedia.org/g/mediawiki/extensions/MediaModeration/+/64eee7e258a8d6bf5356ed81c01e5c23db1d34a4/src/ThumbnailProvider.php
- PhotoDNA lists accepted file types here https://developer.microsoftmoderator.com/docs/services/57c7426e2703740ec4c9f4c3/operations/57c7426f27037407c8cc69e6/console?ref=mktg which are
Content-Type: image/gif
Content-Type: image/jpeg
Content-Type: image/png
Content-Type: image/bmp
Content-Type: image/tiff
We don't have visibility into load balancer/cdn behavior when photodna requests an image
- We could send photodna file content instead of URLs
- this would eliminate network troubleshooting and allow for local testing
We are not sure that the script actually identifies problematic images
- we could upload the test files to local environment for testing
- we could add those files to the test suite if licensing allows
additional info
It looks like the original implementation sent file content instead of urls
commit e5c6ee716b0230de5a0deec8f4a344cd02ffdc90 Author: Peter Ovchyn <peter.ovchyn@speedandfunction.com> Date: Tue Mar 3 15:45:27 2020 +0200 Implement PhotoDNA integration using MWHttpRequest Bug: T246206 Change-Id: I5a202c949436b9962e48dd52833aa12e37d129fa
but then it changed to sending urls as part of the thumbnail implementation:
commit 9028494fa014a423319090091592ee67994b1b44 Author: Peter Ovchyn <peter.ovchyn@speedandfunction.com> Date: Thu Mar 12 22:17:59 2020 +0200 Send 160x160 thumbnails to photo DNA instead of real files Bug: T246915 Change-Id: I6424f256bb4ba1cba6b115390b9f6e34f728cc5c
In T308451, I found that only ,05% of images are making it to PhotoDNA. We still don't know the exact reason for these errors though.
(Update 5/15/23 - reopening as a parent task for all current MediaModeration debugging tasks)