Add support for KML/KMZ filetype
OpenPublic

Description

When I try to upload a valid KML file I get "File extension does not match MIME type."


Version: 1.17.x
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=53023
https://bugzilla.wikimedia.org/show_bug.cgi?id=55549

bzimport added a project: MediaWiki-Uploading.Via ConduitNov 21 2014, 11:16 PM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz26059.
JeroenDeDauw created this task.Via LegacyNov 22 2010, 5:46 PM
MarkAHershberger added a comment.Via ConduitFeb 12 2011, 4:28 AM

I imagine this is because KML mime types aren't included in includes/mime.info or includes/mime.types.

See http://code.google.com/apis/kml/documentation/kml_tut.html#kml_server

Care to add them?

TheDJ added a comment.Via ConduitFeb 12 2011, 2:30 PM

Created attachment 8132
patch for kml support

I figured adding kml support would be a breeze, but I had not counted on the brain dead browser that is IE6.

Unfortunately, kml contains the element <heading, which triggers the protection in detectScript() that protects from uploads that IE6 might mistake for HTML. It triggers on "<head" not sure if we can work around this, but Tim will know.

Attached: kmlsupport.patch

TheDJ added a comment.Via ConduitFeb 14 2011, 10:59 PM

Note to self, mimetype sniffing of Safari:

oldest: http://trac.webkit.org/browser/trunk/WebKit/Misc.subproj/WebNSDataExtras.m?rev=9259
newest: http://trac.webkit.org/browser/trunk/Source/WebKit/mac/Misc/WebNSDataExtras.m?rev=75909

Apparently no issue with head for safari. But i'm guessing IEContentAnalyser would still block the file as well.

Other note; website with mime signature detection of other browsers: http://webblaze.cs.berkeley.edu/2009/content-sniffing/

MarkAHershberger added a comment.Via ConduitFeb 19 2011, 3:28 AM

Applied at r82436.

I just checked and realized you have commit access: is there a reason you didn't commit this yourself?

TheDJ added a comment.Via ConduitFeb 19 2011, 7:04 PM

@mark yeah i didn't commit it yet, because it functionally doesn't work :D

MarkAHershberger added a comment.Via ConduitFeb 19 2011, 8:32 PM

Understood. Just trying to make sure the work that is done so far gets included.

Peachey88 added a comment.Via ConduitApr 30 2011, 12:09 AM

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

JeroenDeDauw added a comment.Via ConduitMay 15 2011, 11:59 AM

Any progress on this?

bzimport added a comment.Via ConduitMay 15 2011, 12:27 PM

Bryan.TongMinh wrote:

No.

bzimport added a comment.Via ConduitNov 1 2011, 2:13 AM

wikiadmin wrote:

There are still times that you want to serve a KMZ from a MW but it never get near a browser. Who cares what IE thinks? For instance, we use:

http://maps.google.com/maps?q=http%3A%2F%2Fwiki.bogus.com%2Fimages%2F8%2F82%2FBogus.kmz

It's Google that ends up uploading the KMZ.

All you need to do is add kmz to the application/zip line in includes/mime.types

bzimport added a comment.Via ConduitNov 9 2011, 3:54 AM

sumanah wrote:

r91109 was a partial revert. A finished patch would be appreciated, if someone has time and interest -- Renate?

TheDJ added a comment.Via ConduitNov 19 2011, 12:41 PM

I don't think many of the folks here understand the complexity of this problem. The IE6 filter is part of our security system. This is simply one file format which we cannot 'just add'. If it were as simple as that, I had made my commit.

Fixing this requires extensive knowledge about the way browsers do content sniffing and touches on the security aspects of MediaWiki. This makes the pool of developers that can actually fix this rather small. And that is BEFORE we start touching on the subject of if supporting this file is even possible at all while keeping the same security context as we have at the moment.

TheDJ added a comment.Via ConduitNov 19 2011, 12:45 PM

Actually, with our new improved zip parser, kmz might actually be possible to add at this point in time.

kml will still have the same issue as before. The files will trip the IE6 content sniffing filters that are in place to protect IE6 users.

TheDJ added a comment.Via ConduitNov 19 2011, 1:13 PM

Another P.S. in my patch I addd a new mediatype called DATA. If anyone wants to work on this again, we need to update the table creation scripts to be able to recognize that new constant value.

bzimport added a comment.Via ConduitFeb 23 2012, 2:03 PM

xen.project wrote:

I'm guessing no additional attempts have gone into this since November. At the English Wikipedia a group of editors have discovered a very good use for KML files in representing linear features on google/bing maps. Currently the text of the KML is posted to a talk page and run from there.

Can the software not be set to treat kml as a raw text file, rather than attempting to parse it as html? We don't need to run the file, we just need a more convenient way of adding them than copying and pasting the contents into a subpage.

bzimport added a comment.Via ConduitFeb 24 2012, 12:03 PM

Bryan.TongMinh wrote:

(In reply to comment #15)

I'm guessing no additional attempts have gone into this since November. At the
English Wikipedia a group of editors have discovered a very good use for KML
files in representing linear features on google/bing maps. Currently the text
of the KML is posted to a talk page and run from there.

Can the software not be set to treat kml as a raw text file, rather than
attempting to parse it as html? We don't need to run the file, we just need a
more convenient way of adding them than copying and pasting the contents into a
subpage.

The problem is not whether MediaWiki interprets the KML file as HTML or not, but the fact that certain broken browsers will treat the KML files as HTML, opening a whole lot of security vulnerabilities.

As DJ said, KMZ could be acceptable though.

bzimport added a comment.Via ConduitFeb 24 2012, 3:01 PM

xen.project wrote:

Thank you Microsoft for continuing to poison the internet. There's no way to get the browser to treat it as a comment or a piece of raw text instead of it trying to parse it as html? Will this change as IE6 moves towards 1% of the browser market and websites move away from compatibility with it?

I'm just worried that the kmz thing adds another step to what some are calling complicated as is, and takes away a few of the great aspects of kml (being able to extract the coords, manipulating the precision with a bot, etc)... But if its the only solution we can pull off then at least it's a start.

bzimport added a comment.Via ConduitFeb 24 2012, 11:28 PM

daniel wrote:

Pardon my ignorance, but couldn't the IE6 filter just let '<heading>' pass and only block all other variants of '<head*' ?!
Side note: some sites already see IE6 below one percent. When can we stop letting IE6 tie down progress?

TheDJ added a comment.Via ConduitFeb 25 2012, 8:04 PM

(In reply to comment #18)

Pardon my ignorance, but couldn't the IE6 filter just let '<heading>' pass and
only block all other variants of '<head*' ?!

I'm quite sure Tim or I looked at that, but IE6 itself seems to specifically looks for <head* , and that is the behavior that the filter needs to (and does) match in order to protect the IE6 users.

bzimport added a comment.Via ConduitFeb 25 2012, 8:29 PM

xen.project wrote:

Just also a note, that Google Earth produced KMLs do not contain the <heading> element... I believe it is completely optional. Unfortunately I don't know how these filters work or what is triggering what precisely... Is it that IE6 users wouldn't be able to upload it, or that they wouldn't be able to view the File: page of a KML, or some other thing?

http://www.ie6countdown.com/ has a good statistical overview of IE6 usage worldwide.

TheDJ added a comment.Via ConduitFeb 25 2012, 8:37 PM

The problem is that IE6 will treat everything that has <head in the first several bytes as HTML. That means that if someone uploads a specifically crafted KML file and an unsuspecting IE6 user downloads it, that the machine of the IE6 user can be compromised. Our IE6 content filter protects against the uploading of any content that would trigger any of the 15 or so strings that will convince IE6 that something is HTML, even though it isn't. So uploading a JPEG with <head in the EXIF at the start of the file, would also trigger the filter and not allow you to upload that file. Unfortunately KML will always trigger the filter.

bzimport added a comment.Via ConduitFeb 25 2012, 9:52 PM

richardg_uk wrote:

But what if there were no <heading> tags, or none in the first 1KB?

To add to my confusion, there's already an optional exemption in UploadBase.php to allow SVG files containing "<title" (which is an IE sniff tag)

On the other hand, UploadBase.php does not test for other sniff tags (such as "<plaintext" for IE7, which presumably applies also to IE6) are not tested for

Does this mean that the Berkeley list is too broad, or that there is a potential vulnerability in MediaWiki?

bzimport added a comment.Via ConduitApr 28 2012, 5:45 PM

xen.project wrote:

According to w3s, Internet explorer 6 is now below 1.0% of internet usage as of March 2012. Can this filter be removed, or reversed so that IE6 users are blocked from viewing certain file types. Its rediculous to inconvenience ourselves and to halt progress for 0.9% of the internet, 25% of which comes from China and can't even view most of Wikipedia

http://www.w3schools.com/browsers/browsers_explorer.asp

bzimport added a comment.Via ConduitAug 23 2012, 3:25 PM

rd232 wrote:

Is there no way to simply block IE6 users from viewing or downloading KML files, so security for them is not an issue? If we can't serve that 1% of users (or it's too tricky to do so safely), fine, what about the rest of us?

bzimport added a comment.Via ConduitOct 2 2012, 8:00 PM

xen.project wrote:

If a reply isn't posted from a developer within a week to move forward with this, I'm opening a new bug report until action is taken. This delay is retarded. It is absolutely backwards development to restrict the advancement of 99% of the internet for the 1% of laggers who would still use a text based browser if that was what came with their default Windows installation. Just block IE 6 and force people to upgrade; problem solved, KML can be enabled, and we can move forwards with this capability.

If not, then who is the head honcho deciding that that group of thumb twiddlers deserves to be catered to?

Jalexander added a comment.Via ConduitOct 7 2012, 9:37 PM

Opening a new ticket until action is taken isn't going to help anything move faster (it may actually make it move slower). Someone was asking on IRC about this and so I'm adding Chris Steipp as well since it's a security issue and he might have an idea on the risks involved and what we can do. I do not know of any place where we've blocked a specific browser from using a specific type of file but that doesn't mean it's out of the bounds of possibility. I certainly think this is a file that we want to make possible if we can.

That said, your comments here Jess are unacceptable. This is not a place to throw around insults.

Yes, we try to support anything that has over 1% of our page hits. IE6 has just over 2% for actual html page requests right now. You're right, a large portion of IE6 users are still in China but you're wrong that most of them can't view Wikipedia. Yes we're blocked occasionally but usually they can get to most of the pages. Our mission is to spread this free knowledge to as many people as possible, it would be totally unacceptable to just say "you're not welcome here because of your browser" especially at levels that high. 2% of requests i still millions of page requests. That is especially true when most people who are using IE6 do not get to make that choice themselves or are in areas of the world where you especially want to get information.

Pushing to try and get a bug resolved is completely ok, insulting specific groups of users or trying to flood the bug channels is not. Please tone your rhetoric down.

(Said as a personal community member/admin and not as a staff member)

csteipp added a comment.Via ConduitOct 9 2012, 5:18 PM

Thanks for adding me James. This is the first I had heard of it.

Unfortunately, kml is a fairly complex and feature rich format, so it would really need it's own special parsing, similar to the svg format. Beyond just IE6 sniffing and running javascript in the kml, there is also the issue that kmls can embed javascript that the plugin will execute, and link to external resources that could be used to track our users. So at minimum, a solution would need to do pretty extensive script/css filtering, and remove anything that looked like a link to an external resource.

Bawolff added a comment.Via ConduitNov 5 2012, 9:26 PM

Can I get a hip hip hurrah for google not checking mime types... </sarcasm>.


By a brief look at template:Attached_KML, it seems that the templates only use a small portion of the KML standard. It may perhaps be less work to do a custom tag (easytimeline style) where we generate a safe kml file from a simpler language for specifying coordinates to highlight on the map.

The downside is obviously that in the future people might want more features from their kml.


But what if there were no <heading> tags, or none in the first 1KB?

That would take care of the IE6 issue, but as Chris mentions there are other concerns, in particular allowing third parties to track the ip's of our users.

Bawolff added a comment.Via ConduitNov 5 2012, 9:42 PM

(In reply to comment #28)

Can I get a hip hip hurrah for google not checking mime types... </sarcasm>.

As an aside, according to google maps docs, "HTML content is allowed but is sanitized to protect from cross-browser attacks", which makes google maps not checking mime types (and hence the Wikipedian's inline kml file hack) much less scary... :D

bzimport added a comment.Via ConduitNov 5 2012, 11:55 PM

richardg_uk wrote:

Isn't the new ContentHandler designed to handle non-wikitext article "paradata" such as KML/KMZ?

Though discussion here had been dormant for a while, I had assumed that it was exactly the kind of case which the new handler would enable.

(In reply to comment #28)

By a brief look at template:Attached_KML, it seems that the templates only use
a small portion of the KML standard. It may perhaps be less work to do a custom
tag (easytimeline style) where we generate a safe kml file from a simpler
language for specifying coordinates to highlight on the map.

The downside is obviously that in the future people might want more features
from their kml.

A sanitised subset is exactly what is sought and required. Wikitext and SVG are already subject to tag whitelisting, which is what KML needs.

>But what if there were no <heading> tags, or none in the first 1KB?
That would take care of the IE6 issue, but as Chris mentions there are other
concerns, in particular allowing third parties to track the ip's of our users.

Once external resource requests are filtered (as with SVG files), there is no more privacy leakage than there would be with a plain external URL in an article's wikitext. Google Maps just downloads the raw content of the specified subpage if a reader clicks the Attached KML link.

Qgil added a comment.Via ConduitMar 25 2013, 4:27 AM

(As suggested by Bawolff at http://www.mediawiki.org/wiki/Talk:Mentorship_programs/Possible_projects#GSOC_2013_candidates_missing_one_thing_or_two_25493 )

Do you think the development of this feature is suitable for a Google Summer of Code project? If you think this make sense then we would need a short description of the project published at http://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects and at least one mentor.

From there we would publish it at https://www.mediawiki.org/wiki/Summer_of_Code_2013#Project_ideas

Fabrice_Florin added a comment.Via ConduitDec 18 2013, 7:33 PM

Moved to Normal, because we do not view this as high priority at this time.

Rschen7754 added a comment.Via ConduitAug 9 2014, 10:41 PM

Well, Erik Moller has just sent out an email saying that JS on IE6 will be disabled completely with 1.24wm17.

Does this mean that something could be done with this bug? Or would it be better to shift efforts to Wikidata? (or both, and just have Wikidata link to files on Commons?)

Bawolff added a comment.Via ConduitAug 10 2014, 7:57 AM

Well something could have always been done, just noone has been willing to spend the time to do it.

The js announcement doesnt affect the security issues mentioned above.

JeroenDeDauw added a comment.Via ConduitAug 10 2014, 2:45 PM

The use cases for which I opened the bug are not helped in any way by Wikidata. They also have nothing to do with the WMF. So having Wikidata does not help.

Rschen7754 added a comment.Via ConduitAug 10 2014, 4:24 PM

Joeroen De Dauw: see bug 55549, which would solve the big-picture problem with Wikidata.

Gilles added a project: Multimedia.Via WebNov 24 2014, 3:41 PM
El_Grafo added a subscriber: El_Grafo.Via WebJan 16 2015, 9:37 AM
Qgil added a comment.Via WebFeb 11 2015, 1:44 PM

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

Qgil added a comment.Via WebFeb 16 2015, 11:52 PM

If there is no interest / critical mentoring mass to promote this project for GSoC / Outreachy, then maybe the current "Normal" priority should be lower?

Rschen7754 added a comment.Via WebFeb 17 2015, 1:43 AM

It is hard to say, since this is (according to my understanding) a security issue, and because it is not clear how Wikidata will affect this.

TheDJ added a comment.Via WebFeb 17 2015, 7:43 AM

@Qgil, I note this is not an easy task, since there are some security aspects to it... That's the reason why it stalled. The task itself is easy, but the background is complex and a solution will require cooperation with and validation by @csteipp

Qgil added a comment.Via WebFeb 17 2015, 8:54 AM

Thank you for the quick feedback. Ok, I will remove the Possible-Tech-Projects tag. Feel free to bring it back if/when this task is a good candidate for GSoC/Outreachy.

Qgil removed a project: Possible-Tech-Projects.Via WebFeb 17 2015, 8:54 AM
Qgil set Security to None.
Bawolff added a comment.Via WebFeb 17 2015, 11:54 PM

The security issues don't sound all the complicated. They basically involve building a parser, checking the file is valid according to some subset of the spec (For our usecase, it might not even have to be a very big subset).

Possibly maybe also adding 64kb of whitespace to the beginning of the file to trick the IE parser.

Not a walk in the park, but possibly within the realm of what a gsoc student could do. Of course it would still require a mentor (I am not volunteering)

Khushbuparakh claimed this task.Via WebFeb 26 2015, 12:41 PM
Khushbuparakh placed this task up for grabs.
Khushbuparakh added a subscriber: Khushbuparakh.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.