Page MenuHomePhabricator

Normalize file extension while uploading file in UploadWizard
Closed, ResolvedPublic

Description

UploadWizard, unlike Special:Upload, does not let you enter the full file name by hand (only the part before extension). As many cameras produce files with uppercase .JPG extension, and many editors prefer lowercase .jpg and would prefer to upload them that way, we should normalize this automatically.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:14 AM
bzimport added a project: UploadWizard.
bzimport set Reference to bz34703.
bzimport added a subscriber: Unknown Object (MLST).

Something tells me this would be more easily, and more reliably, accomplished on the client's side before the upload process is initiated....why would a user need to change the file extension only after choosing the file?

Well, not really.

I uploaded recently IMG_6152.JPG to become

https://commons.wikimedia.org/wiki/File:Lower_St_Francesco_Basilica_in_Assisi_-_Interior.JPG

I would prefer to be able to change

IMG_6152.JPG to "Lower St Francesco Basilica in Assisi - Interior.jpeg" in the process.

OK, that makes some sense. So maybe having arrays of names that are compatible with each other, then offering to switch between them (dropdown after the filename field?) during the details step. I'll see what I can do.

Thanks for taking this up.

I would say the dropdown is an overkill.

Maybe just extension added at the end of the filename with the dot. However, it might be difficult to determine whether the intention was to add an extension or some dotted segment (some.file.name - do we accept this?).

Maybe if the filename ends with . + one of the allowed extensions, it is left as-is?

Hm. A big concern of mine is that less-experienced users might see an error message like "incompatible filetype" and not know what to do. For those users, it would almost certainly be nicer to not even allow the mistakes that might cause the error. We already have plenty of confusing error messages in UW, I'm not about to add another one!

The more I think about this, the more I think it might be better to solve this problem with a server-side configuration setting--let the site admins specify that they want all file extensions (upper|lower)case, and then perform that operation on the client-side before upload. Maybe that's the way to solve this, because there's no burden on the user whatsoever, and it would make things look a lot more uniform.

Thoughts?

(also, I'm not currently working on this, but I might during the next UW sprint, which is coming up soonish)

We already have http://www.mediawiki.org/wiki/Manual:$wgFileExtensions
plus we have also a pretty complex MIME/content type detection mechanism (see
also http://www.mediawiki.org/wiki/File_upload)

some.file -> uploaded as (autodetected), add default extension
file.jpg -> uploaded as the JPEG file, keep JPG extension
file.jpeg -> uploaded as the JPEG file, keep JPEG extension
file.svg -> uploaded as the SVG file, keep SVG extension

But there is a problem too, suppose I want to upload today

some.file.webm -> upload as (autodected), I will probably get an error

but once bug 30653 is fixed, it will become:

some.file.webm -> upload as WebM file, use .webm extension

do you think this is too confusing?

(however, to add some confusion, there is bug 38927 as well, did bug 30653 got
fixed?).

I don't understand what any of that has to do with this.

The solution I just suggested would not affect anything like what files you could/couldn't upload, it would just be "if the file extension (the string after the last '.' in the filename) is upper case, make it lower case". That's the only thing I think we need, plus another parallel solution for making it upper case. Would that be enough?

We could also add a config option for funneling, e.g., 'jpeg' -> 'jpg'.

I have pointed to a related problem like Marcin Cieślak (my camera generates .JPG files and UW does not allow me to normalize this to .jpg) in bug 40326. That bug, however, is receiving a treatment somewhat different from what I originally wished.

I am fine with Mark Holmquist's solution, because I do not see any rationale for which someone would might like having a file with extensions like .JpG, .Jpg or even .JPG, if we all agree that .jpg is the normalized letter case (and perhaps even spelling – cf. .jpeg and its variants).

Furthermore, the discussion at bug 40326 reveals that there actually already exists the practice of promoting a "normalized" extension, which is .jpg in this case. Therefore, the solution proposed by Mark Holmquist would actually resolve what bug 40326 was in fact originally about.

I am glad that somebody else in concerned with this as well and wants to do something abou it, and I hope that things will go on rolling in the right direction.

2.5 years since last post and I am still forced to upload images with names such as https://commons.wikimedia.org/wiki/File:Brno,_Dominik%C3%A1nsk%C3%A1,_odlo%C5%BEen%C3%A1_uli%C4%8Dn%C3%AD_cedule.JPG – which confuses users of those images who need to pay attention to the extension's letter case – only because my camera names them "P1550467.JPG" and Upload Wizard does not allow me to change the extension :-(

I do not want to touch the files in my archives (or make an extra copy, also because archives can be on a read-only file system) just for the purpose of uploading them to Wikimedia Commons. As @saper pointed out, splitting the renaming process into two places does not make things easier. And knowing that with each upload I spoil others' work by unwillingly producing imperfect outcome makes me less keen on contributing multimedia to Wikimedia Commons.

Following shows the distribution of JPEG file extensions on Commons as of today:
jpg 16940592
JPG 5740630
jpeg 169102
JPEG 7031
Jpeg 654
Jpg 566
jPG 42
JPg 24
jpG 17
JPeG 6
jpe 6
JpEg 4
jPeG 4
jPg 3
JpG 3
Does supporting this mishmash do any good to content reusers, e.g. Wikipedia editors who use our files?

So what you're saying is that we should just normalize the extension to ".jpg", and generally to lowercase?

So what you're saying is that we should just normalize the extension to ".jpg", and generally to lowercase?

That makes sense I.M.H.O.

matmarex renamed this task from Unable to change file extension while uploading file to Normalize file extension while uploading file in UploadWizard.Sep 11 2015, 6:08 PM
matmarex updated the task description. (Show Details)
matmarex set Security to None.
matmarex removed a subscriber: wikibugs-l-list.

Low priority, but this is annoys me too and I'll look into getting it fixed.

Change 289136 had a related patch set uploaded (by Bartosz Dziewoński):
Normalize file extension for uploaded files

https://gerrit.wikimedia.org/r/289136

Change 289136 merged by jenkins-bot:
Normalize file extension for uploaded files

https://gerrit.wikimedia.org/r/289136

matmarex removed a project: Patch-For-Review.

This should be deployed to Commons this week. Please report if you notice any uppercase file extensions on files uploaded with UploadWizard afterwards.