Page MenuHomePhabricator

MimeAnalyzer: Document that built-in guesses override PHP/OS detection
Open, MediumPublic

Description

The guessMimeType method in the MimeAnalyzer class uses the doGuessMimeType method by default to detect the mime type of files (https://github.com/wikimedia/mediawiki/blob/master/includes/libs/mime/MimeAnalyzer.php#L520). This method is MediaWiki's custom mime detection code. Only if this method fails to detect the mime type, is an external API used ("detectMimeType"). Unfortunately, MediaWiki's custom mime detection code might contain bugs which could be prevented if the external API is used by default, for example T291750. The current behavior is inconsistent with the documentation at https://www.mediawiki.org/wiki/Manual:MIME_type_detection ("If installed, MediaWiki uses PHP's FileInfo module, or the older MimeMagic module.").

The code should probably be changed to call detectMimeType first instead of doGuessMimeType.

  • Update the wiki page with correct high-level behaviour.
  • Write a MimeAnalyzer class comment that describes the library's primary and secondary objectives with regards to security and why it is the way it is.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle renamed this task from MediaWiki uses custom mime type detection by default to MediaWiki uses built-in mime type detection by default.Sep 27 2021, 7:16 PM
Krinkle triaged this task as Medium priority.Sep 27 2021, 7:21 PM
Krinkle removed a project: MediaWiki-Uploading.
Krinkle subscribed.

If a MediaWiki extension or local configuration by site admin customises the mime map, then that is expected to take precedence. I believe however in your case there are likely no custom mime maps registered, but rather you're referring to the set built-in mime map. Both the built-in map and any custom overrides are expected to be checked first. This ensures a consistent cross-platform experience and also allows certain security requiremens to be met.

The external mime check is as default base layer beyond that, used if the built-in map and any custom overrides did not find a satisfying match. I'm re-classisying this as a issue for the documentation to be reviewed and updated as-needed.

You are right, there are no custom mime maps registered (as far as we know). We also agree that MediaWiki does provide a consistent cross-platform experience by using a built-in mime map. However, it is quite unfortunate that this code might contain bugs such as the one linked. We personally feel the benefits of using a specialized library, such as PHP finfo, could outweigh its disadvantages. Still, it is up to the MediaWiki team to balance these arguments and decide which is best.

If you still decide to use the built-in map by default, the documentation should indeed be reviewed and updated.

I chatted with @tstarling who knows the code and its historical context better than me. As I understand it now, the built-in "guess overrides" that come with MediaWiki exist primarily for security, and not so much for correctness. When hosting a public site that permit anyone to upload files that are hosted and served publicly on the web, it's important not to spread potentially dangerous executables. E.g. those hidden in ambiguous file types, or that can may otherwise fool FINFO tooling which are generally not developed with security in mind. E.g. a corrupted file should possibly still report on one's desktop computer as what it most likely is/was, and then let associated handlers (e.g video players or other software), or anti-virus software decide what to do about the corruption.

If security isn't a concern, e.g. on wikis that permit only trusted individuals to edit pages and upload files, or if the wiki is otherwise non-public, one could consider as (temporary) workaround to permit "zip" files and accept that the reported file type may be incorrect.

The order of operations is indeed as intended, and I'll re-purpose this as a documentation task. Office files are unfortunately caught in this but there isn't an obvious alternative that I see here given that they are also true ZIP files and expandable as such.

Krinkle renamed this task from MediaWiki uses built-in mime type detection by default to MimeAnalyzer: Document that built-in guesses override PHP/OS detection.Oct 6 2021, 4:39 AM

Proposal:

  • Update the wiki page with correct high-level behaviour.
  • Write a MimeAnalyzer class comment that describes the library's primary and secondary objectives with regards to security and why it is the way it is.