Page MenuHomePhabricator

File on remote repository causes same duplicate warning as file on local repository
Closed, ResolvedPublic

Description

When you try to upload, say, https://commons.wikimedia.org/wiki/File:Foobar.jpg to your local wiki (with InstantCommons enabled) as a test of UploadWizard, it will give you a message something like "There is another file on this site with the same content." Now, you can go to the file page and discover that the file is actually transcluded from a remote repository (Wikimedia Commons), but UploadWizard won't tell you that, because it has no way of knowing without performing *yet another* API request to fetch the image info.

Now, I see two options:

  • Include information about which repository has the image in the duplicate warnings array. Right now it's just an array of file names that are duplicates. If it were an array of stripped-down imageinfo objects, we could have the title, the repository name, and some extra stuff that might be useful.
  • Return a different warning for duplicates on different repositories. This would be potentially more helpful, since the difference between one remote repository and another is roughly nil - however, in case you care, the new warnings array could use the above suggestion and also tell you which repository is the problem.

If there are others, I don't know, but one of those needs to happen.

Event Timeline

MarkTraceur raised the priority of this task from to Needs Triage.
MarkTraceur updated the task description. (Show Details)
MarkTraceur added a subscriber: MarkTraceur.
Restricted Application added a project: Multimedia. · View Herald TranscriptDec 3 2015, 5:29 PM
Restricted Application added subscribers: StudiesWorld, Steinsplitter, Aklapper. · View Herald Transcript
Anomie added a subscriber: Anomie.Dec 3 2015, 5:40 PM
  • Include information about which repository has the image in the duplicate warnings array. Right now it's just an array of file names that are duplicates. If it were an array of stripped-down imageinfo objects, we could have the title, the repository name, and some extra stuff that might be useful.

I note that would be a breaking change in the API unless we come up with some parameter for requesting it, and I'm wary of "stripped-down imageinfo objects" considering T89971. But it's possible, since the Upload code gives us the File objects for the duplicates.

  • Return a different warning for duplicates on different repositories. This would be potentially more helpful, since the difference between one remote repository and another is roughly nil - however, in case you care, the new warnings array could use the above suggestion and also tell you which repository is the problem.

This would be up to the upload code to do, not the API, although ApiUpload would likely need adjusting to treat the new warning in the same way it does 'duplicate'.

If there are others, I don't know, but one of those needs to happen.

Another solution to T58291 could be for UW to allow ignoring the duplicate-file warning.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptDec 3 2015, 5:40 PM
MarkTraceur triaged this task as Normal priority.Dec 21 2015, 5:19 PM
MarkTraceur moved this task from Untriaged to Triaged on the Multimedia board.Dec 5 2016, 9:45 PM
MarkTraceur moved this task from Triaged to Next up on the Multimedia board.Jun 5 2017, 3:32 PM

Change 360828 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/UploadWizard@master] Allow upload of files that are duplicate on a foreign repo.

https://gerrit.wikimedia.org/r/360828

I have a patch up that kind of addresses all of what was discussed here.

Instead of changing ApiUpload (we don't want to break BC, and I don't want to add yet another param), I perform an API call to ApiQueryImageInfo right after discovering there are duplicates: that one can also tell the source (imagerepository)
It is another API call, but it's a simple call & only happens for duplicates, so it shouldn't cause much traffic.

I haven't really made a difference in error messages - it'll still just say "This file is a duplicate of the following files:" and then list both local & foreign duplicates.
Foreign duplicates will no longer link to the local (transcluded) copy, but directly to the foreign source, so now you can kind of distinguish them without having to click through.
I also made it so you can chose to proceed with the upload anyway if the duplication is only on a foreign repo. Local duplicate uploads are still blocked.

Does that make sense, or do we want something else?

matthiasmullie moved this task from Next up to Needs code review on the Multimedia board.

Change 360828 merged by jenkins-bot:
[mediawiki/extensions/UploadWizard@master] Allow upload of files that are duplicate on a foreign repo.

https://gerrit.wikimedia.org/r/360828

matthiasmullie closed this task as Resolved.Sep 27 2017, 8:02 AM
matthiasmullie reopened this task as Open.

Whoops - shouldn't have closed; hasn't been QA'd yet!

Ramsey-WMF added subscribers: Cparle, Ramsey-WMF.

Assigning to @Cparle for QA/review

Cparle closed this task as Resolved.Nov 16 2017, 11:17 AM

@matthiasmullie looks good to me, closing

Cparle reopened this task as Open.Nov 16 2017, 11:31 AM

Oops, closed the wrong ticket

Cparle closed this task as Resolved.Nov 16 2017, 2:10 PM

There's some kind of issue with uploading files to testwiki with my account, but when I try and uploaded a file that already exists on commons get the warning on the first page of UploadWizard and can ignore it and proceed, which AFAICS means this ticket is resolved. Closing