Page MenuHomePhabricator

Same content + same filename uploads should be considered duplicated when from foreign repo
Closed, ResolvedPublic

Description

UploadBase::checkAgainstExistingDupes finds duplicate files with the same content (by hash), but omits files that have the same name as the one being uploaded.
This may make sense for local files (same filenames will already be reported as 'exists' warning, and 'nochange' if the content is the same)

'exists' & 'nochange' are only checked on local repositories, though.
For matches in foreign repositories, you won't get these warnings. Duplicates are still treated the same, though: if they have the same filename, they'll be filtered out.
As a result, a same name + same content upload of a file in a foreign repo will not generate any warnings.

I suggest we don't filter out same-name duplicates in UploadBase::checkAgainstExistingDupes when it's on a foreign repo.

Event Timeline

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptNov 16 2017, 1:44 PM
matthiasmullie removed Cparle as the assignee of this task.Nov 16 2017, 3:32 PM
Anomie removed a subscriber: Anomie.
Ramsey-WMF moved this task from Untriaged to Triaged on the Multimedia board.Nov 16 2017, 5:40 PM

@matthiasmullie I seem to remember that the reason you wanted to fix this was it caused some kind of bug - it looked liked you'd be able to upload a file that has the same name as a file on a remote repo, but at the last minute you get blocked ... something like that anyway. I can't remember for sure, can you remind me so I can take a look at it?

The problem is that it will *not* error or warn.

Reproduce:
1/ Download a random file from commonswiki (e.g. https://commons.wikimedia.org/wiki/File:HamptonCourtHerefordshire.jpg)
2/ Make a copy of that file & rename it (so that you have 1 copy with the original name, and one with a name that does not already exist)
3/ Attempt to upload both files to testwiki: https://test.wikipedia.org/wiki/Special:UploadWizard or https://test.wikipedia.org/wiki/Special:Upload
4/ Notice how the renamed copy will show a warning ("This file is a duplicate of the following files: File:HamptonCourtHerefordshire.jpg" and will allow you to "Upload anyway"), but the one with the original name will not show any warnings

The renamed version is ok: you're informed that the file already exists in a remote repository, but it'll still let you upload it if you so desire.

The original name will not give such warning.
When you continue uploading, you'll eventually be warned about using the same filename as an existing file ("A file with this name exists already. If you want to replace it, go to the page for File:HamptonCourtHerefordshire.jpg and replace it there."), but you can just rename it and keep uploading, without having been informed that the *content* of the file also exists already.

Change 394351 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/core@master] Warn for uploads with new name but same content as local file

https://gerrit.wikimedia.org/r/394351

Cparle claimed this task.Nov 30 2017, 6:07 PM
Cparle moved this task from To Do to Doing on the Multimedia-Team-Working-Board board.
Cparle moved this task from Doing to Code Review on the Multimedia-Team-Working-Board board.

Change 394351 merged by jenkins-bot:
[mediawiki/core@master] Warn for uploads with new name but same content as local file

https://gerrit.wikimedia.org/r/394351

matthiasmullie closed this task as Resolved.Dec 12 2017, 2:36 PM