Page MenuHomePhabricator

IE content analyzer is executed on non-first chunks during chunk uploading, preventing legitimate uploads
Open, Needs TriagePublic

Description

When trying to upload an image file ("The Maze - temporary reflective labyrinth in front of Chapelle (brusselslights).JPG", which I have uploaded on https://drive.google.com/open?id=1ipYfTKUXF3xREe3XbDU0SxN3Vdqacq1- ) to Wikimedia Commons, I get the following error:

Cannot upload this file because Internet Explorer would detect it as "application/x-msdownload", which is a disallowed and potentially dangerous file type.

This has not yet happened with any other picture taken with this camera.

edit:
I get the following error when I try to upload the file to phabricator:

Exception: No configured storage engine can store this file. See "Configuring File Storage" in the documentation for information on configuring storage engines.

edit:
Upload was successful after stripping off the embedded thumbnail using "exiftool -ifd1:all= -ext jpg FILENAME". (This was not necessary with other files, although the embedded data bot would often reupload my files without the embedded thumbnail.)

Event Timeline

zhuyifei1999 subscribed.
Cannot upload this file because Internet Explorer would detect it as "application/x-msdownload", which is a disallowed and potentially dangerous file type.

If my understanding of IEContentAnalyzer::checkBinaryHeaders is correct this should only trigger when the first two bytes is 'MZ', but here it is \xFF\xD8. Weird. Is the sha1 checksum of your file 170bb587605d926667aedcfc7cb82fd82e72c207?

I get the following error when I try to upload the file to phabricator:

Exception: No configured storage engine can store this file. See "Configuring File Storage" in the documentation for information on configuring storage engines.

The file is 5.6MiB. This is because of T155130, and the maximum size that can be uploaded is 4MiB.

Is the sha1 checksum of your file 170bb587605d926667aedcfc7cb82fd82e72c207?

Yes

Is this uploaded via UploadWizard? It may be because of chunked uploading and 'MZ' happens to be at the start of one chunk.

It is uploaded using the UploadWizard

UW uploads the file in two chunks, first chunk starting at position 0 for length 5242880 (5MiB), second chunk starts at position 5242880 for length 616894 (602KiB). The second chunk indeed starts with 'MZ':

$ xxd The\ Maze\ -\ temporary\ reflective\ labyrinth\ in\ front\ of\ Chapelle\ \(brusselslights\).JPG | grep 500000 -A 5 -B 5
004fffb0: 039a fcad a4f7 3f75 4fab 2acb 2226 ddcf  ......?uO.*."&..
004fffc0: 3872 7a22 fcb8 ab76 e434 61fe 62cc 7e42  8rz"...v.4a.b.~B
004fffd0: e79a 12bb 1391 1889 e466 672b cf34 8225  .........fg+.4.%
004fffe0: 8599 e366 cf7c 1ce7 fce6 ad2e a4df 410c  ...f.|........A.
004ffff0: 9b8e ec80 4f5c 1a8d cba0 2415 033c ee6e  ....O\....$..<.n
00500000: 4d5a 5a99 c9b6 7fff d4fe 2592 7ba7 9196  MZZ.......%.{...
00500010: 5425 377f 7ba0 ad23 6e80 0204 818f 45eb  T%7.{..#n.....E.
00500020: 5f9a 5ac7 ed0d df73 37cd 9525 62c5 ca06  _.Z....s7..%b...
00500030: e14f 7a8d b578 ae18 2895 2491 4e31 d715  .Oz..x..(.$.N1..
00500040: 566d 5d89 c944 89f5 e86d 1de3 36ca d2ab  Vm]..D...m..6...
00500050: 6376 7e5a 49bc 4714 b0a3 794d 148a bdbe  cv~ZI.G...yM....
zhuyifei1999 renamed this task from Image upload failure: Cannot upload this file because Internet Explorer would detect it as "application/x-msdownload"... to IE content analyzer is executed on non-first chunks during chunk uploading, preventing legitimate uploads.Dec 7 2017, 5:46 AM
zhuyifei1999 removed zhuyifei1999 as the assignee of this task.

T143610: UploadBase::detectScript() is executed for partially uploaded files (verifyPartialFile()), not only complete ones (verifyFile()) which it expects, causing false positives is related.

@Trougnouf As a workaround, you should be able to upload the original version by using non-chunked methods such as the old bare upload form (reupload), or if you really must use chunked uploading (such as for files > 100 MiB), you can change the chunk size using upload scripts such as Rillke's bigChunkedUpload.js, so that 'MZ' do not appear at the start of a chunk.

Thank you for your impressive work!

I've already uploaded the file using a different version with stripped thumbnails that successfully uploaded (I was trying to satisfy the Embedded Data Bot but I don't know of a command that will remove the non-standard exif data / binary blob so I will let the Embedded Data Bot keep doing its job)

Maybe its time we had a discussion about IEContentAnalzyer. IE6 is very rarely used now a days. You cant even connect to wikimedia sites with it without a proxy (due to lack of TLS1.0).

The question that would have to be researched is do other browsers content sniff. Versions of IE newer than 6? Safari?

This indeed looks like the same issue as T143610, except the offending function is UploadBase::verifyMimeType() rather than detectScript(). This should be very easy to fix, but a similar fix for T143610 was rejected because folks are not sufficiently certain that the partially uploaded files are only accessible to their uploader, rather than everyone.

Change 998655 had a related patch set uploaded (by Maddog; author: Maddog):

[mediawiki/core@master] upload: Allow disabling of per-chunk file verification during chunked upload

https://gerrit.wikimedia.org/r/998655