Page MenuHomePhabricator

Improve user feedback in html detected upload error
Open, LowPublic

Description

Currently when you upload a jpg that has html in the exif tag you get a message alla: "This file contains HTML or script code which could be executed by a browser".

This is really not that helpful to the "normal" user. It doesn't say what the user has to do to rectify this problem if he wants to upload the file. I was thinking that perhaps this is a location where we could add a link to a Mediawiki help page or something. This page could then detail how to remove html code from an exif tag or from some other fileformats and give further hints on how to deal with the problem.


Version: 1.17.x
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=25707

Details

Reference
bz25163

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:18 PM
bzimport set Reference to bz25163.
bzimport added a subscriber: Unknown Object (MLST).

sumanah wrote:

This is reasonably easy to do on a technical level.

a.koppad wrote:

I want to work on this simple bug. I am slightly slow because I am working on this bug. Please bear with me.

Does some page exist that explains how to remove HTML code from an exif tag? If not that's probably the first step, and then finding the exact location of the string by grep'ing the codebase.

(In reply to comment #3)

Does some page exist that explains how to remove HTML code from an exif tag?

No. At best [[commons:Commons:Exif]] details how to edit the exif fields. Someone would have to make such a page on mediawiki.org


Note: HTML in exif tags are just one cause of this issue. In theory it could be exif anywhere in the file. It could be a non-jpeg file (For example people sometimes get this error if they enable uploading of certain xml-based formats) etc.

(In reply to comment #2)

I want to work on this simple bug. I am slightly slow because I am working on
this bug. Please bear with me.

If you run into any trouble or need any advice, don't hesitate to ask.

sumanah wrote:

Anu, how is it going? Are you having any trouble?

a.koppad wrote:

@(In reply to comment #5)

Anu, how is it going? Are you having any trouble?

@Sumana, Thanks for asking. I have a problem with the settings on my computer and is not related to the bug directly. Once I can set that right, I can start working on this bug.

sumanah wrote:

Anu, are you still having trouble with your computer settings?

ritusparks wrote:

Hey! Could I work on this bug?? I am a newbie! Could somebody assign me the bug and tell me how to get started?

RituS: See comment 3 & comment 4 for specific things to consider for this issue; for general info/help see https://www.mediawiki.org/wiki/Developer_access . If something with MediaWiki development is unclear *in general*, please check out https://www.mediawiki.org/wiki/MediaWiki_on_IRC or https://lists.wikimedia.org/mailman/listinfo .

A few questions: Is IE's autodetection really so bad that we need to reject valid binary images because they have something that looks like HTML in them? Also, how did HTML (or something that looks sort of like HTML) end up in an EXIF tag in the first place? If there's no non-contrived way, then is this worth worrying about? Finally, is the right way to "fix" this really to link users to a page telling them how to remove EXIF data?

(In reply to Jackmcbarn from comment #10)

A few questions: Is IE's autodetection really so bad that we need to reject
valid binary images because they have something that looks like HTML in
them? Also, how did HTML (or something that looks sort of like HTML) end up
in an EXIF tag in the first place? If there's no non-contrived way, then is
this worth worrying about? Finally, is the right way to "fix" this really to
link users to a page telling them how to remove EXIF data?

IE6 has shockingly bad content detection

Html ends up in exif tags mostly from people adding things like <a href=... to exif tags

One possible alternative fix (would need review safety of this) i think would be to have mw modify the file to add 255 bytes of padding (jpg allows padding markers in file immediately after the first marker. The other case this issue happens is certain xml formats, which allow whitespace padding).

I started working on this bug.

My current fix consists in adding a link (that redirects the user to a page that has information regarding to how to delete the 'bad' tag (still working on the page, I somehow added bad advice and inaccuracies)) to the error message received when uploading a file.

I was wondering if I should continue like this or change the fix with the padding solution (even though the user can also upload non-jpeg files).

I was wondering if I should continue like this or change the fix with the
padding solution (even though the user can also upload non-jpeg files).

The padding thing is much much much more complicated (bug 25707), and also requires review of its soundness by somebody who knows the ins and outs of the IE content detection. I'd recommend just working on changing the message, at least for now (If you're interested, you can of course work on the padding thing, but its a much bigger job then what one would normally want to take on for a first bug to fix).

Change 164726 had a related patch set uploaded by Tuxilina:
Improved user feedback in html detected upload error

https://gerrit.wikimedia.org/r/164726

I changed the message and I added a link to https://www.mediawiki.org/wiki/Remove_Exif_tags . Now I just need some help to make this page accurate, and not full of bad advice. Should I leave this help page blank?