Page MenuHomePhabricator

Handle duplicate files gracefully
Closed, ResolvedPublic

Description

On loading the SpecialPage, check whether this file has already been uploaded to commons. If it hasn't, proceed to the page. If it has, instead of the normal page, return the following to the user:

The file you are currently trying to upload already exists on Commons.
You can find it under this link: <link>

<Image as it has been found on Commons>

Event Timeline

Change 342845 had a related patch set uploaded (by Addshore):
[mediawiki/extensions/FileImporter] WIP Handle duplicate files gracefully

https://gerrit.wikimedia.org/r/342845

Addshore moved this task from Todo to Sprint ready on the WMDE-TechWish board.

@Lea_WMDE I have a few questions here that need clarifying.

  1. Do we want to look at all file revisions or just current file revisions?
  2. Do we want to look at deleted file revisions of just public file revisions?

The current file is by far the most important one. Therefore, I would only indicate to the user if that exact file exists on Commons already. For all other cases, I would not do any action as the file as a predecessor of the current file is outdated for the one case the moved file is used for, but on Commons might be still current in other situations. Therefore I would be ok with having it double

I guess really a decision needs to be made from a matric roughly as below:

Look at the most recent revision being imported ONLYLook at all revisions being imported
Look at the most recent revision on commons ONLY
Look at all revisions on commons (not including deleted files)
Look at all revisions on commons (including deleted files)

We need a tick in 1 box for the intended behaviour.

I would suggest this.

Look at the most recent revision being imported ONLYLook at all revisions being imported
Look at the most recent revision on commons ONLYx
Look at all revisions on commons (not including deleted files)
Look at all revisions on commons (including deleted files)

Any thoughts as to why this would be a bad/good idea?

Just to be nitpicking; what does "this file has already been uploaded to commons" actually mean? An identical file, a grossly similar file, what about a transformed or cropped file?

@jeblad: I strongly assume an identical file. Everything else would most likely be a) hard to check and/or b) give too many false positives/negatives.

@jeblad: I strongly assume an identical file. Everything else would most likely be a) hard to check and/or b) give too many false positives/negatives.

Yup, with the way this is currently implemented only EXACT duplicate files will be stopped at this point.
And indeed, doing anything else would probably be too hard / not worth it.

Change 342845 merged by jenkins-bot:
[mediawiki/extensions/FileImporter] Handle duplicate files gracefully

https://gerrit.wikimedia.org/r/342845

Addshore moved this task from Currently in sprint to Done on the WMDE-TechWish board.
Addshore moved this task from Active 🚁 to Closing ✔️ on the User-Addshore board.

@Lea_WMDE @Jan_Dittrich for reference I have attached a screenshot of how this currently looks below.

pasted_file (485×847 px, 58 KB)