Page MenuHomePhabricator

JPEGs on Commons: Several versions uploaded but only one record in history
Closed, ResolvedPublic

Description

https://commons.wikimedia.org/wiki/File:Парк_платанов_в_градостроительном_ансамбле_%22Крутогорный%22.JPG - there are three versions uploaded but in the history I see only one record.

We have to much such strange "bugs" on commons (see other media storage bugs) sine months.

It is time that somone from the WMF techs looks into this.


Version: master
Severity: critical
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=65338
https://bugzilla.wikimedia.org/show_bug.cgi?id=65339

Details

Reference
bz64883

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:16 AM
bzimport added a project: UploadWizard.
bzimport set Reference to bz64883.

History tab:
https://commons.wikimedia.org/w/index.php?title=File:%D0%9F%D0%B0%D1%80%D0%BA_%D0%BF%D0%BB%D0%B0%D1%82%D0%B0%D0%BD%D0%BE%D0%B2_%D0%B2_%D0%B3%D1%80%D0%B0%D0%B4%D0%BE%D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D0%BE%D0%BC_%D0%B0%D0%BD%D1%81%D0%B0%D0%BC%D0%B1%D0%BB%D0%B5_%22%D0%9A%D1%80%D1%83%D1%82%D0%BE%D0%B3%D0%BE%D1%80%D0%BD%D1%8B%D0%B9%22.JPG&action=history

one entry

File history:
https://commons.wikimedia.org/wiki/File:%D0%9F%D0%B0%D1%80%D0%BA_%D0%BF%D0%BB%D0%B0%D1%82%D0%B0%D0%BD%D0%BE%D0%B2_%D0%B2_%D0%B3%D1%80%D0%B0%D0%B4%D0%BE%D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D0%BE%D0%BC_%D0%B0%D0%BD%D1%81%D0%B0%D0%BC%D0%B1%D0%BB%D0%B5_%22%D0%9A%D1%80%D1%83%D1%82%D0%BE%D0%B3%D0%BE%D1%80%D0%BD%D1%8B%D0%B9%22.JPG#filehistory

three entries

Log:
https://commons.wikimedia.org/w/index.php?page=File%3A%D0%9F%D0%B0%D1%80%D0%BA+%D0%BF%D0%BB%D0%B0%D1%82%D0%B0%D0%BD%D0%BE%D0%B2+%D0%B2+%D0%B3%D1%80%D0%B0%D0%B4%D0%BE%D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D0%BE%D0%BC+%D0%B0%D0%BD%D1%81%D0%B0%D0%BC%D0%B1%D0%BB%D0%B5+%22%D0%9A%D1%80%D1%83%D1%82%D0%BE%D0%B3%D0%BE%D1%80%D0%BD%D1%8B%D0%B9%22.JPG&title=Special%3ALog

three entries

API:
http://commons.wikimedia.org/w/api.php?action=query&titles=File:%D0%9F%D0%B0%D1%80%D0%BA%20%D0%BF%D0%BB%D0%B0%D1%82%D0%B0%D0%BD%D0%BE%D0%B2%20%D0%B2%20%D0%B3%D1%80%D0%B0%D0%B4%D0%BE%D1%81%D1%82%D1%80%D0%BE%D0%B8%D1%82%D0%B5%D0%BB%D1%8C%D0%BD%D0%BE%D0%BC%20%D0%B0%D0%BD%D1%81%D0%B0%D0%BC%D0%B1%D0%BB%D0%B5%20%22%D0%9A%D1%80%D1%83%D1%82%D0%BE%D0%B3%D0%BE%D1%80%D0%BD%D1%8B%D0%B9%22.JPG&prop=imageinfo&iilimit=10&format=jsonfm

Three entries for imageinfo. Dates:
2014-05-05T11:49:48Z -> 2014-05-05T11:49:50Z -> 2014-05-05T11:49:54Z

I assume overwrite has occured in one batch upload using UploadWizard.

Thanks for reporting! Are there also cases for non-cyrillic filenames that you're aware of?

Created attachment 15297
FF RC; Screenshot illustrating the issue

Reproduced at https://commons.wikimedia.org/wiki/File:Treppe_2222_test_upload.jpg

Attached:

upwiz_bug.png (1×949 px, 31 KB)

So here is how I was able to reproduce it:

  • Uploaded three files and provided the following names:
    • Treppe_2222 test upload.jpg
    • Treppe 2222 test_upload.jpg
    • Treppe 2222 test_upload

Note the underscores and the missing file extension. It appears that UpWiz detecting the underscore-issue (that's why file 2 in screenshot was not published) but does not check after appending the file extension.

Should sysops split images manually or developers can do it themselves?

Well, let me try to find a fix.

How can an organization with a budget of US$55 million don't fix such a HIG PRIO bug ??

Can you clarify what this bug is about? The first comment talks about uploading multiple versions of the same file but some of them not appearing in the history; Rillke is talking about UploadWizard uploading different files with similar names as versions of the same file. Many images in https://commons.wikimedia.org/wiki/User:RLuts/UpWizBug seem to display some third kind of bug where the old versions of files are missing.

Also, can you give more information on what the impact is? https://commons.wikimedia.org/wiki/User:RLuts/UpWizBug lists 17000 images - how was that list compiled? Is this a timing issue or does it always happen with certain kinds of file names?

Example file (in first comment) is already split and has one uploaded version. But file in comment 4 still has bug. Other example of this bug here: https://commons.wikimedia.org/wiki/File:Lillepeenrad_Oru_pargis.jpg.

List in comment 9 contains files which were reuploaded by upwizard (upwizard should not reupload files with similar names, and if upload log contains "User created page with UploadWizard" then it is upwizard bug).

... and comment 5 lists how to reproduce the bug.

Gergő, "1 record in history" refers to the "history tab" I think, were only one is recorded. See my comment 1.

So if I understand correctly, we are talking about three related bugs:

e.g. https://commons.wikimedia.org/wiki/File:0054%D7%A8%D7%97%D7%95%D7%91_%D7%91%D7%A0%D7%97%D7%9C%D7%90%D7%95%D7%AA-%D7%99%D7%A8%D7%95%D7%A9%D7%9C%D7%99%D7%9D.jpg

  • seems like this happens when UploadWizard uploads the exact same image twice?

3,008 × 2,000 (3.*6*6 MB)
3,008 × 2,000 (3.*0*6 MB)

Change 133434 had a related patch set uploaded by Rillke:
UploadWizard: Check for duplicate titles

https://gerrit.wikimedia.org/r/133434

Follow-up:
Bug 65338 - API: action=upload overwrites files despite ignorewarnings is not set
Bug 65339 - MediaStorage: Several versions uploaded but only one record in history

(In reply to Tisza Gergő from comment #14)

  • sometimes file history entries are broken, e.g.

https://commons.wikimedia.org/wiki/File:
0054%D7%A8%D7%97%D7%95%D7%91_%D7%91%D7%A0%D7%97%D7%9C%D7%90%D7%95%D7%AA-
%D7%99%D7%A8%D7%95%D7%A9%D7%9C%D7%99%D7%9D.jpg

Seems similar to bug 53770.

(In reply to Rainer Rillke @commons.wikimedia from comment #15)

e.g. https://commons.wikimedia.org/wiki/File:0054%D7%A8%D7%97%D7%95%D7%91_%D7%91%D7%A0%D7%97%D7%9C%D7%90%D7%95%D7%AA-%D7%99%D7%A8%D7%95%D7%A9%D7%9C%D7%99%D7%9D.jpg

  • seems like this happens when UploadWizard uploads the exact same image twice?

3,008 × 2,000 (3.*6*6 MB)
3,008 × 2,000 (3.*0*6 MB)

bug 54750 ?

  • This bug has been marked as a duplicate of bug 54750 ***

Change 133434 merged by jenkins-bot:
UploadWizard: Check for duplicate titles

https://gerrit.wikimedia.org/r/133434