File extensions should be automatically decided by MIME type at upload
OpenPublic

Description

Author: johnnymrninja

Description:
Breaking this of related bug 32660, which was broken off of bug 4421. This would also solve bug 29284.

As MW detects the MIME type of the file as it is being uploaded, it should not rely on the uploader to provide a file extension. Rather the file type should be set automatically by the software. Any extension detected in the name should be automatically removed.

For example if Cheese.JPEG is uploaded, but the MIME type is PNG, the file should be named Cheese.png, and not Cheese.JPEG.png. If that MIME type is correct, it should simply be named Cheese.jpg. This should also create a notice for the uploader, so they don't lose track of their uploaded file.

Obviously this will not fix existing issues mentioned in the first two bugs, but it will prevent future issues.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=32660
https://bugzilla.wikimedia.org/show_bug.cgi?id=29284

bzimport added a project: MediaWiki-Uploading.Via ConduitNov 22 2014, 1:08 AM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz40479.
bzimport created this task.Via LegacySep 24 2012, 4:37 PM
bzimport added a comment.Via ConduitSep 24 2012, 4:58 PM

johnnymrninja wrote:

Hopefully this should also prevent files with unknown or unsupported MIME types from being uploaded with a supported extension. So Trojan.XXX shouldn't be uploaded as Trojan.gif. This would mean a list of extensions that unknown MIME type uploads are checked against.

bzimport added a comment.Via ConduitSep 24 2012, 5:11 PM

svenmanguard wrote:

This is fantastic. I recommend that you use the shortest form in all lowercase as the chosen extension (i.e. ".jpg" instead of ".jpeg" or ".JPG". This is because .jpg is the most common variant for jpegs by a great deal, and .tif is the most common varient for tiffs by something of a large-ish margin.

My one concern is the handling of .ogg and .ogv. These two can /occasionally/ but not always be used interchangeably, or at the very least, have been. We can't eliminate either, but we might (I lack the technical knowledge to tell for certain) run into problems with this.

Thanks for doing this,
Sven

Bawolff added a comment.Via ConduitSep 24 2012, 5:26 PM

Say someone uploads a file named: "esp. cute dogs.jpg"

Ignoring the fact commons probably doesn't need yet another pic of someone's puppies, the period denotes the esp is an abbreviation for especially. Under this proposal would you like us to
A) prevent the file being uploaded
B) Auto rename it to esp.jpg
C) Magically recognize the ". cute dogs" is not an extension, and let it through.

bzimport added a comment.Via ConduitSep 24 2012, 5:32 PM

johnnymrninja wrote:

(In reply to comment #2)

This is fantastic. I recommend that you use the shortest form in all lowercase
as the chosen extension (i.e. ".jpg" instead of ".jpeg" or ".JPG". This is
because .jpg is the most common variant for jpegs by a great deal, and .tif is
the most common varient for tiffs by something of a large-ish margin.

My one concern is the handling of .ogg and .ogv. These two can /occasionally/
but not always be used interchangeably, or at the very least, have been. We
can't eliminate either, but we might (I lack the technical knowledge to tell
for certain) run into problems with this.

Thanks for doing this,
Sven

.ogg is used generically for the container format, but .ogv is designed solely
for OGG video, and .oga is solely for OGG audio. As they have separate MIME
types, there shouldn't be an issue.

The main source of conflation is that OGG audio codec is called "OGG Vorbis",
so some people assume that the extension .ogv is for that (I know I did).

Worst case, if there is some issue with OGG, or people are super-attached to
the generic extension, the MIME type can be left alone for now.

The vast majority of uploads are pictures, and I'd rather see only those issues
resolved than none at all.

bzimport added a comment.Via ConduitSep 24 2012, 5:36 PM

johnnymrninja wrote:

(In reply to comment #3)

Say someone uploads a file named: "esp. cute dogs.jpg"

Ignoring the fact commons probably doesn't need yet another pic of someone's
puppies, the period denotes the esp is an abbreviation for especially. Under
this proposal would you like us to
A) prevent the file being uploaded
B) Auto rename it to esp.jpg
C) Magically recognize the ". cute dogs" is not an extension, and let it
through.

The software already knows which extensions belong to which MIME types, it's not magic. As ". cute dogs" is not an extension, there would be no issue. There is no reason to attack every period, only known extensions.

Even unknown extensions should be safe, as long as their MIME type is equally unknown. If the MIME type is known, it's appended. So if a JPEG is uploaded as "esp. cute dogs.dog", it would become "esp. cute dogs.dog.jpg", and the uploaded is asked if they wish to continue.

bzimport added a comment.Via ConduitSep 24 2012, 5:42 PM

johnnymrninja wrote:

To be absolutely clear, this should only relate to extensions at the end of the
file. So "exe.gif.png.jpg" would be a fine name for a JPEG, if bizarre.

Jarekt added a comment.Via ConduitSep 26 2012, 1:48 PM

Two Comments:

  1. a Commons source of extension MIME type mismatch is the reupload feature. For example http://commons.wikimedia.org/wiki/File:Grb-Pozarevac.jpg was uploded as jpg and than someone reupload a gif over it. I guess reupload should not allow use of other MIME types and offer to upload it under a new name.
  2. See http://commons.wikimedia.org/wiki/User:Dispenser/sandbox for examples of 1,625 other files with extension mismatch found on Commons.
Platonides added a comment.Via ConduitSep 26 2012, 1:54 PM

Jarek, Commons currently blocks you from uploading most files with a wrong mime type.

Jarekt added a comment.Via ConduitSep 26 2012, 2:02 PM

But it does not block me from uploading (or reuploding) MIME:JPG file with .PNG extension, like http://commons.wikimedia.org/wiki/File:TPR2011.png uploaded this March.

McZusatz added a comment.Via ConduitSep 26 2012, 2:38 PM

(In reply to comment #9)
I can't reproduce this behavior.

Jarekt added a comment.Via ConduitSep 26 2012, 2:55 PM

I just tried and I can not reproduce it either. I tried new upload with extension mismatch and reupload. I guess someone fixed it since March when http://commons.wikimedia.org/wiki/File:TPR2011.png was uploaded. Status: Fixed?

waldyrious added a comment.Via ConduitSep 26 2012, 3:01 PM

(In reply to comment #11)

Status: Fixed?

I think what is fixed is the reupload conflicts, not this bug which deals with first-time upload.

bzimport added a comment.Via ConduitApr 1 2013, 8:19 AM

johnnymrninja wrote:

Just to summarize (got a bit off-track up there):

1.We would maintain a list of accepted mime types and their preferred file extension.
2.Files would automatically receive an extension based on their mime type.
3.Files that are uploaded with known extensions that do no match would be renamed after a prompt ("Renaming to 'Dog.gif'. Do you wish to continue?")
4.File names would not be otherwise modified. If a file is named "dog.gif.png" and it is a JPEG, it would be renamed "dog.gif.jpg". If it was named "dog.gif.cat", it would be uploaded as "dog.gif.cat.jpg".

For the purposes of this bug, the only things that would have to be modified are the file uploader, and file renaming/moving. This would not change how files are displayed or used, or even the nature of the filename. File redirects could still be manually created at these other extensions. It would just reduce the options at the time of upload, and potentially make other bugs easier to fix in the future.

Is there anyone willing to theorize on how doable this is as a bug?

McZusatz added a comment.Via ConduitAug 13 2013, 6:09 PM

I think this can be closed as RESOLVED-DUPLICATE of bug 40326

Platonides added a comment.Via ConduitMay 11 2014, 6:33 PM

bug 40326 seems a different bug. Comment 13 summary seems correct but I would only change the uploader. Renaming is a more manual process, and I am sure there will be cases where there's a desire to override that detection.

bzimport added a comment.Via ConduitJul 22 2014, 12:53 AM

cuerden wrote:

My only suggestion here is that, if a filename has a different MIME type to its suggested extension, surely that should be enough for an "are you sure?" prompt first, as it might well be that the uploader is uploading the wrong file.

Gilles added a project: Multimedia.Via WebNov 24 2014, 3:41 PM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.