Page MenuHomePhabricator

files with a bad extension are not rejected if their true mime type can not be determined (PATCH)
Closed, ResolvedPublic

Description

Currently, files that have a file extension that does not match their real mime
type are rejected ONLY if the real type can be determined. You could upload a
bunch of random bytes as a JPG. Note that the real mime type is determined by
$wgMimeDetectorCommand, finfo_open or mime_content_tye, depending on setup.

I suggest to change this behavior, see the patch attached. The patch does the
following:

IF the mime type can not be determined BUT the file extension is listed in
mime.info ($wgMimeInfoFile), reject the file. IF the mime type can not be
determined AND the file extension is NOT listed in mime.info, allow the file.

The only possible problem I see is this:

IF an "obscure" file type is allowed BUT that type is not recognized by the mime
detection AND the file extension is listed in mime.info, such files will be
rejected. The solution would be to remove that file type from mime.info (or
change the method of mime detection).


Version: 1.6.x
Severity: normal
OS: Linux
Platform: PC

Details

Reference
bz3641

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:50 PM
bzimport set Reference to bz3641.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 946
patch against head

Attached:

This bug is a regression in 1.5; it doesn't always seem to trigger, but it can
reproduce it on my Linux test box with $wgMimeDetectorCommand = 'file -bi'.

Applied fix to REL1_5 branch, will appear in 1.5.1.

Patch is bad -- has broken uploads of all filetypes other than built-in images.

Reopening.

Created attachment 950
Fix to the fix

Additional patch to fix the breakage of unrecognized types. Adds a separate
function MimeMagic::isRecognizableExtension() which just matches on extensions
which are known to be recognizable from their content (currently, the list of
types for getimagesize()). Extensions on that list will be rejected when
detection comes back with 'unknown'; extensions not on that list can't be
reliably detected from content so will be passed on as before.

Attached: