Page MenuHomePhabricator

MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed
Closed, ResolvedPublic1 Estimated Story PointsBUG REPORT

Description

When the TimedMediaHandler extension is installed, calls to File::canRender for audio and video results in a value of true. Without the extension installed this results in false. The documentation for File::canRender indicates that:

Currently, this checks if the file is an image format
that can be converted to a format
supported by all browsers (namely GIF, PNG and JPEG),
or if it is an SVG image and SVG conversion is enabled.

Because audio and video are not strictly images, this documentation is wrong when the TimedMediaHandler extension is installed. Because of this, the MediaModeration extension should ignore content that could be rendered (as determined by File::canRender) but such a render doesn't produce a supported image that fully conveys the meaning of the file. For example, video should not be included because such a picture would only be a frame from it. Also, audio files should be ignored as no image could be made that indicates the content in a way understood by PhotoDNA.

Steps to replicate the issue
  1. Install TimedMediaHandler
  2. Upload a midi file to the wiki (example at https://commons.wikimedia.org/wiki/File:2_Star_Spangled_Banner.mid)
  3. Run the following SQL replacing YYYY with the image of the uploaded file name without the File: prefix, with the file extension and with spaces replaced with underscores:
Select query for step 3
SELECT img_sha1 FROM image WHERE img_name = 'YYYYY';
  1. Replacing YYYY with the value returned by the query above, run the following:
Select query for step 4
SELECT * FROM mediamoderation_scan WHERE mms_sha1 = 'YYYY';

What happens
A row is returned by the query in step 4

What should have happened instead?
No rows are returned by the query

Software version

MediaWiki 1.42.0-alpha (4752fa6) 16:02, 28 November 2023. MediaModeration 0.1.0 (c16d8b0) 15:55, 28 November 2023

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
Resolvedkostajh
ResolvedNone
ResolvedTchanders
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedBUG REPORTDreamy_Jazz
OpenNone
ResolvedBUG REPORTDreamy_Jazz
ResolvedDreamy_Jazz
Resolvedkostajh
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz

Event Timeline

I propose the following to fix this:

  • Make MediaModeration only accept files that have File::getMediaType return MEDIATYPE_BITMAP or MEDIATYPE_DRAWING
  • Ensure that MediaModerationFileProcessor::canScanFile is always called before attempting to scan file (even if the DB has an entry for it)
  • (potentially) Write a maintenance script to remove files from the scan table that do not have File::getMediaType return MEDIATYPE_BITMAP or MEDIATYPE_DRAWING.

I am unsure if the last is needed because there may be things that change whether a file can be scanned. For example, if SVG rendering is turned off the scanning of an image that is in svg form will not work. As such, the code should check on an attempt to scan the result of MediaModerationFileProcessor::canScanFile. As such, entries that were previously scannable may become unscannable. What to do in this case probably needs some further thought (for example SVG rendering could be turned back on at a later date).

Change 978155 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/MediaModeration@master] Only allow MEDIATYPE_DRAWING and MEDIATYPE_BITMAP files to be scanned

https://gerrit.wikimedia.org/r/978155

Dreamy_Jazz changed the point value for this task from 2 to 1.Nov 29 2023, 12:11 AM
Dreamy_Jazz renamed this task from MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as renderable when TimedMediaHandler extension is installed to MediaModerationFileProcessor::canScanFile incorrectly registers audio and video files as scannable when TimedMediaHandler extension is installed.Nov 29 2023, 12:14 AM

The documentation for File::canRender indicates that:

Currently, this checks if the file is an image format
that can be converted to a format
supported by all browsers (namely GIF, PNG and JPEG),
or if it is an SVG image and SVG conversion is enabled.

Yeah this is incorrect, we really should update this. It actually means "Can provide a thumbnail", where thumbnail means 'a result for [[File:..|thumb]]
The alternative of it being the 'blank' file icon.

Screenshot 2023-11-29 at 12.08.09.png (520×614 px, 35 KB)

There is also 'mustRender' which means: have to transform this in order to be able to show it in the browser (For instance: Native SVG could disable that and any transforms would be completely skipped before the toHtml is triggered)

Change 978155 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@master] Only allow drawing and bitmap media types to be scanned

https://gerrit.wikimedia.org/r/978155

Change 979141 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/core@master] Remove incorrect information from doc of File::canRender

https://gerrit.wikimedia.org/r/979141

QA should not be needed for the change being made to core (as this only affects method documentation).

When running QA you can use the steps listed in the task description.

One addendum to the QA steps is that you should use a midi file that hasn't been uploaded to the wiki before. This means choosing a new one from https://commons.wikimedia.org/wiki/Category:MIDI_files that you have not used for testing before.

Djackson-ctr subscribed.

I have verified the fix for this issue has been implemented and is working as expected per the Ticket Description...
Thank you @Dreamy_Jazz!!!

image.png (342×754 px, 26 KB)

Change 979701 had a related patch set uploaded (by Kosta Harlan; author: Dreamy Jazz):

[mediawiki/extensions/MediaModeration@wmf/1.42.0-wmf.7] Only allow drawing and bitmap media types to be scanned

https://gerrit.wikimedia.org/r/979701

Change 979701 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@wmf/1.42.0-wmf.7] Only allow drawing and bitmap media types to be scanned

https://gerrit.wikimedia.org/r/979701

Mentioned in SAL (#wikimedia-operations) [2023-12-05T14:23:51Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:979698|User impact: update quantizeViews to process small series of view data (T352349)]], [[gerrit:979700|Add maintenance script to import existing files to scan table (T350863)]], [[gerrit:979701|Only allow drawing and bitmap media types to be scanned (T352234)]]

Mentioned in SAL (#wikimedia-operations) [2023-12-05T14:25:08Z] <urbanecm@deploy2002> kharlan and urbanecm: Backport for [[gerrit:979698|User impact: update quantizeViews to process small series of view data (T352349)]], [[gerrit:979700|Add maintenance script to import existing files to scan table (T350863)]], [[gerrit:979701|Only allow drawing and bitmap media types to be scanned (T352234)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-12-05T14:32:47Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:979698|User impact: update quantizeViews to process small series of view data (T352349)]], [[gerrit:979700|Add maintenance script to import existing files to scan table (T350863)]], [[gerrit:979701|Only allow drawing and bitmap media types to be scanned (T352234)]] (duration: 08m 55s)

Change 979141 merged by jenkins-bot:

[mediawiki/core@master] Remove incorrect information from doc of File::canRender

https://gerrit.wikimedia.org/r/979141