Page MenuHomePhabricator

SF needs to check for file existence to prevent replacing existing file using upload function
Open, Needs TriagePublic


For context

This is also a shortcoming of the standard MediaWiki upload system, but SF seems to be better designed and more capable of resolving the problem in its own way. This enhancement/bug may be related to this one:

T32537: Ability to check for existence and uniqueness of a property

Uploading a file with the same name as an existing file will not only replace the existing file, but also any form data submitted will replace the existing semantic data too. SF could benefit from the ability to check for the existence of a file before setting to overwriting existing files and their data. It would also be helpful if SF could automatically rename the file when creating a new file page.

For right now, the best workaround I could come up with is to use the default file name parameter, like this:

default filename=image {{#time: U | now}}.jpg

Obviously, that only works for JPEGS, and will create a new problem for users who are uploading PDF's or whatever. But, automatically renaming the file with a timestamp does at least help to ensure the name is unique, although it doesn't ensure it.

Comment from (updated) form text:

<!-- Files and semantic data will be overwritten if file name is not unique. Must use the default filename parameter with a timestamp to ensure (but not guarantee) the file name is unique. No way to automatically detect and append file extension, so creates new problem for users not uploading a JPEG. They will get a warning about mime type not matching file extension, and they will have to figure out the correct extension for themselves. --><nowiki />

If I remember correctly, the file is not actually saved yet when the warning is produced, so SF doesn't fail on the warnings like it does in this bug report:

T34425: Reuploading deleted file causes SF to fail

Version: unspecified
Severity: normal




Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 12:05 AM
bzimport set Reference to bz32426.
Badon created this task.Nov 15 2011, 2:20 AM

The upload window already displays a warning if the specified filename exists in the wiki, just like Special:Upload does.

Badon added a comment.Nov 15 2011, 5:26 AM

Right, but it doesn't check for existing semantic data. If the user proceeds with the upload despite the warnings, it will result in changing semantic data. The user gets nothing to indicate that's what will happen. If they leave the form blank, for example, it blanks the file's semantic data. Even with an adequate warning, it's unlikely that the user would both understand what it means, and decide not to do it. It's possible they're just playing with the form and want to see what it does (which is how I discovered the problem).

Ideally it should be possible for me to configure things to warn the user, and maybe prevent an edit from being made from an upload. At this point, I'm not sure many people are aware of the distinction between a Semantic Form being used for upload and the same Semantic Form being used to edit a previous upload. It's not documented yet, as far as I know.

I've had to get a bit creative in setting the form up to detect whether it's being used to upload or edit. I suppose that could be improved in future feature enhancements, but until then, I'm thinking about this from the perspective of me coding it up to handle the distinction manually.

Maybe that's the wrong approach though. Should Semantic Forms be enhanced to better handle the distinction between uploading and editing? I'll have to think about that some. I'm not sure exactly what it should do in each case, since it might depend on what the form is being used to do.

In my use case I hide the upload form when editing and replace it with a thumbnail of the image file being edited. Of course, that only works for images. I'm not sure the potential use cases are constrained enough to do something by default, depending on whether uploading or editing, other than hiding the upload field when editing.

I didn't understand this at all. What does semantic data have to do with it? It's strictly based on the underlying text in a page, which is what forms modify.

I'm setting this to "invalid"; feel free to re-open, if you still think there's an issue there.

Badon added a comment.Dec 9 2011, 12:35 AM

See also:

Take this for example:

{{{info | create title=Upload image | edit title=Edit image | page name=File:<Image[File]>}}}

with a corresponding file upload field like this:

{{{field | File | uploadable}}}

As you can see from the info tag, the page to be edited is taken from the name of the file that's being uploaded. If the user ignores warnings that their file name is the same as an existing file, the file will be overwritten when they complete the upload. Hopefully the warnings make that obvious.

However, what isn't obvious is that if the user also submits the form, the form will overwrite whatever semantic data was was on the file page before, without any warning, and perhaps without the user meaning to do that. So far, it appears this happens when

  • users are knowingly trying to upload a new version of a file, but do not know the semantic data will be overwritten too if they submit the form.
  • users are trying to upload a completely new file with the same name as an existing file, and they ignore the upload warnings because they don't understand the, or they simply don't read them. The upload replaces the original file. When they submit the form, new semantic data replaces the original semantic data, without warning.

I don't know of any easy way around that. The best I've come up with is assigning random names to all uploaded files, so if the user ignores the warnings, nothing will be overwritten:

{{{field | File | uploadable | default filename={{#time: U | now}}-{{#rand:0|9999}}.jpg}}}

SF needs a way for form code to detect whether it is uploading or editing, and make decisions based on that. So far, checking for an output from {{PAGENAME}} is the only way I've been able to put that information into my code so I can display the form differently, perhaps with warnings that existing data will be overwritten if the form is submitted.

Hopefully that is clearer.

Badon added a comment.Dec 29 2011, 8:23 PM

Reopening. I was forced to upgrade to 2.3.2 when 2.2.1 was having problems with MediaWiki 1.18, so I can't ignore this issue anymore. In 2.2.1 I could work around the problem by hiding the upload field to discourage unintended overwriting of semantic data, but magic words don't work very well in 2.3.2:

Both of those bugs appear to be unrelated to each other, but they are both probably caused by a similar approach to handling magic words. Of course, the magic words are just useful in a programmatic workaround for this bug32426. Semantic Forms should handle this automatically.

Here's an example (login Demo/test):

If you click Edit with form, you will see an upload field that used to be hidden with my form code, using magic words:

If the user is confused about whether they're editing an image or adding a new one, and upload a new one as well as changing the semantic data, the semantic data will be changed for the EDITED image and the CREATED image will have no semantic data applied.

Essentially, Semantic Forms does not function in this case, so in addition to reopening the bug, I have increased the severity to "Major". I did not choose "Critical" because the problematic consequences of the bug only occur when the user is confused about whether they're editing or creating a file. That can easily happen if the user thinks they can upload more than one file using the same image upload field. Aspects of that are addressed in at least two separate bugs:

SF allows multiple file uploads with a single form field

Support multiple page creation for batch file uploads. (Would be helpful in many other cases too)

Setting the priority to "normal" - if I understand this correctly, it only affects the editing of pages in the "File" namespace, which is something that SF is only rarely used for.

Badon added a comment.Jan 4 2012, 10:37 PM

Hmm, maybe it could affect other namespaces. I'd have to think about it. Either way, I'm sure SF wouldn't be rarely used for files if it were designed for it. I have to use workarounds and more advanced wiki coding to get the expected results, and some of those don't work anymore for the current Semantic Forms 2.3.2 and MediaWiki 1.18.

There are many applications where using Semantic Forms to organize images would be very helpful. In fact, as far as I know, it's THE BEST way to handle and organize images. Images and other visual media are notoriously hard to understand by machines. Using Semantic Forms allows a user to apply machine-readable semantic data to each image they upload. Instead of being a non-starter, it is now possible to easily make visual media searchable, and automatically organizable and browseable, thanks to Semantic MediaWiki and Semantic Forms.

Since everything else is searchable with common machine readable text-based search systems, I think visual media is the primary indispensable forte of Semantic MediaWiki (and Semantic Forms, for supplying the necessary additional data). Underestimating visual data as a rarely-used unusual application for SMW + SF would be a huge mistake.

Lupo's ImageAnnotator takes this one step further, by allow human annotation of images with plain text that can be understood by simple software. If I can persuade Lupo to turn it into an extension, the next step would be to incorporate Semantic Forms into his popup annotation form, so the annotated data can be linked into SMW.

Here's an example (login with Demo/test):

Right now, to add properties to an annotation, you have to do it manually, without the ease of Semantic Forms. Having LIA, SMW, and SF together would be visual media library nirvana.

Aklapper removed Yaron_Koren as the assignee of this task.Sep 2 2015, 3:54 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 2 2015, 3:54 PM