Page MenuHomePhabricator

Upload of MIDI file erroneously rejected
Closed, ResolvedPublic

Description

Uploading the MIDI file at http://mbednarek.com/temp/maidensprayer.mid results in the error message: "This file contains HTML or script code that may be erroneously interpreted by a web browser."

According to user Splarka at http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=315675310#Upload_MIDI this happend because this particular file happened to have the string "<A" in its first 1024 characters. That string is part of a valid MIDI instruction.

Binary files, like MIDI files, should not be subject to screening for text strings; fragments like "<A" in this case can obviously occur outside the context of "HTML or script code."

User Splarka mentioned that "UploadBase.php" might be at the core of the problem.


Version: unspecified
Severity: major

Details

Reference
bz20780

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:46 PM
bzimport set Reference to bz20780.

Bryan.TongMinh wrote:

Something went wrong when I tried to merge r43627 in the new-upload branch with r45378, it should detect '<a href='

Shouldn't that be '<a'*'href=', because <a title="woot i haxored you" href= would still do the trick of course.

herd wrote:

Shouldn't that be '<a'*'href=', because <a title="woot i haxored you" href=
would still do the trick of course.

IE specifically checks for '<A HREF' and since the only reason to have this detection is to pre-guess IE, it just needs to find '<A HREF'. I think.

(In reply to comment #3)

Shouldn't that be '<a'*'href=', because <a title="woot i haxored you" href=
would still do the trick of course.

Like the comments in IEContentAnalyzer.php say, the objective is to emulate IE's broken MIME detection code, not to be correct. You've just pointed out an obvious flaw in IE's algorithm, which IEContentAnalyzer duplicates because its purpose is to predict how IE will react.

Bryan.TongMinh wrote:

(In reply to comment #5)

(In reply to comment #3)

Shouldn't that be '<a'*'href=', because <a title="woot i haxored you" href=
would still do the trick of course.

Like the comments in IEContentAnalyzer.php say, the objective is to emulate
IE's broken MIME detection code, not to be correct. You've just pointed out an
obvious flaw in IE's algorithm, which IEContentAnalyzer duplicates because its
purpose is to predict how IE will react.

Note that this is UploadBase::detectScript, which is separate of IEContentAnalyzer. I don't exactly understand if and how those two functions are related and whether UploadBase::detectScript duplicated functions of IEContentAnalyzer.