Page MenuHomePhabricator

GWT duplicates
Closed, ResolvedPublic

Description

GWT should prevent the upload of duplicates.


Version: unspecified
Severity: major

Details

Reference
bz64831

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:13 AM
bzimport set Reference to bz64831.
Fae added a comment.May 4 2014, 4:34 PM

What would be *great* would be if the GWT were to skip duplicates but complete the requested run, and then report back on SHA-1 duplicates, possibly supplying an xml exceptions file (of <records>) with only the duplicates in it, and preferably with the filename of the duplicated file(s) found in an extra field.

If the user then had the option of setting a flag to force the creation of duplicates at that point, using the xml exceptions file, at least they would be wholly responsible for their actions, could add a "(duplicate check needed)" backlog category as appropriate, and should expect to deal with the duplicates themselves, rather than putting this on other random volunteers.

Change 132751 had a related patch set uploaded by Siebrand:
Don’t allow upload of duplicate mediafiles

https://gerrit.wikimedia.org/r/132751

Change 132751 had a related patch set uploaded by Siebrand:
Don’t allow upload of duplicate mediafiles

https://gerrit.wikimedia.org/r/132751

Created attachment 15350
test metadataset

Attached:

steps to reproduce

notice current item

  1. notice how many mediafiles are present for this item and take note as to whether or not they are the same: http://commons.wikimedia.beta.wmflabs.org/wiki/File:Een_vrouw_brengt_een_offer_aan_Priapus_()-Sc%C3%A8nes_uit_Vergilius_dichtbundel_Bucolica_(serietitel)-RP-P-1992-80-RM0001.COLLECT.70.jpeg

login

  1. http://commons.wikimedia.beta.wmflabs.org/wiki/Special:GWToolset
  2. once logged in and at Step 1: Metadata detection

step 1

  1. nothing to add
  2. select Artwork
  3. GWToolset:Metadata Mappings/Dan-nl/Rijksmuseum.json
  4. nothing to add
  5. choose the attached “test metadataset”
  6. click Submit

step 2

  1. check “Re-upload media from URL”
  2. click the "Preview batch" button

step 3

click the “Process batch” button

note the item change

  1. there should be yet another copy of the same mediafile http://commons.wikimedia.beta.wmflabs.org/wiki/File:Een_vrouw_brengt_een_offer_aan_Priapus_()-Sc%C3%A8nes_uit_Vergilius_dichtbundel_Bucolica_(serietitel)-RP-P-1992-80-RM0001.COLLECT.70.jpeg

steinsplitter, this has been deployed to production. are you okay with marking it as resolved fixed?

steinsplitter, a patch has been deployed to production that addresses this issue. are you okay with closing this bug now?

Gilles raised the priority of this task from High to Unbreak Now!.Dec 4 2014, 10:11 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to High.Dec 4 2014, 11:22 AM