Page MenuHomePhabricator

Recently more broken files (premature end of file) that were cross-wiki uploaded to Commons by several users
Closed, ResolvedPublic

Description

There are many broken files uploaded today, by different users. I don't think that a coincidence.
e.g. https://commons.wikimedia.org/wiki/File:Valentin_Guichaux_posant_%C3%A0_cot%C3%A9_de_son_Mao.jpg
I already deleted more than a dozen like this one.

Event Timeline

Yann created this task.Mar 29 2018, 4:56 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 29 2018, 4:56 AM
Aklapper changed the task status from Open to Stalled.Mar 29 2018, 10:16 AM
Aklapper triaged this task as High priority.
Aklapper added a project: media-storage.

@Yann: Please point to an existing file and/or explain what "broken" means exactly. Thanks!

greg added a subscriber: greg.Mar 29 2018, 5:37 PM

@Yann Also, can you identify when the problem started? Commons is still on the version of MediaWiki from last week (except for a brief period yesterday when we tried to roll forward but had to rollback, see: T183966)

Yann added a comment.EditedMar 29 2018, 6:18 PM

OK, I restored the broken file. @greg I have seen hundreds of broken files in the last weeks, while I used to see one in a week previously (compared to last year).
I can't say when it started, but there is a difference of several order of magnitude. These files are usually speedy deleted.

Adding the Multimedia team for their diagnosis of this issue (as they own many of the parts here).

For https://upload.wikimedia.org/wikipedia/commons/7/74/Valentin_Guichaux_posant_%C3%A0_cot%C3%A9_de_son_Mao.jpg :

$:acko\> rpm -q jpeginfo
jpeginfo-1.6.1-8.fc27.x86_64
$:acko\> jpeginfo -c Valentin_Guichaux_posant_à_coté_de_son_Mao.jpg 
Valentin_Guichaux_posant_à_coté_de_son_Mao.jpg 6000 x 4000 24bit Exif  N 5242880  Premature end of JPEG file  [WARNING]
Aklapper renamed this task from Many broken files to Recently more broken files uploaded to Commons (premature end of JPEG file) by different users.Mar 29 2018, 6:50 PM
Aklapper changed the task status from Stalled to Open.
greg added a subscriber: Cparle.Mar 29 2018, 8:40 PM
19:38:40 +marktraceur | cormac_parle: If you have a minute tomorrow, could you take a look at it?

I have also noticed the broken files. I tagged the first one as a broken file on the 26th Feb - https://commons.wikimedia.org/wiki/File:Hamptons_in_the_Summer.jpg

I'm looking at the code that does this work, and I don't see any way the uploads could be failing mid-process and still getting published to the stash with missing chunks. I think this must have something to do with either connection issues, a new browser version doing slice() differently on file objects, or something I can't anticipate.

brion added a subscriber: brion.Apr 2 2018, 7:51 PM

In case it is helpful to track down the problem, AFAICS all truncated files posted so far are cross-wiki uploads, non cross-wiki uploads are unaffected?

Yann added a comment.Apr 11 2018, 5:35 PM

Two more, now deleted:
Folklorni ansambl "Rožaje" .jpg
Foto Fachada Faculdade Arquidiocesana de Curvelo.jpg

Aklapper renamed this task from Recently more broken files uploaded to Commons (premature end of JPEG file) by different users to Recently more broken files (premature end of JPEG file) that were cross-wiki uploaded to Commons by several users.May 10 2018, 6:57 PM
SJu added a comment.Nov 17 2018, 2:52 PM

The problem is still continuing... I propose to switch cross-wiki uploads off and forbid it until the problem is solved. An other possibility is to set a 5MB limit to cross-wiki uploads, to prevent faulty uploads.

Related comment and discussion:

I see you made Category:Incomplete JPG files (5 MB interruption) and commented at phab:T190988. Using mw:Help:CirrusSearch#filesize, filesize:5120,5120 (exactly 5 MB) currently gives 6008 hits of all file types. Most have the bug. I found Jotzet has reuploaded cropped versions of many affected images. This sometimes seems problematic. For example, File:Barbie Store.jpg does not appear useful and does not reflect the image the author tried to upload. PrimeHunter (discussion) 10:55, 18 November 2018 (UTC)

@PrimeHunter: This bug requiered urgent actions: systems of blocks and warnings to prevent faulty uploads; immediate appeal to the uploaders to reupload the files, with an advice how to avoid the failure. Regrettably, most of the uploads are from one-time and occasional users who can never correct their uploads. Regrettably, many files were speedily deleted without any attempt to request a reupload. Regrettably, the bug continues many month without any appropriate measures. ŠJů 11:06, 18 November 2018 (UTC)

greg removed a subscriber: greg.Nov 20 2018, 6:07 AM

The problem is still continuing... I propose to switch cross-wiki uploads off

That's certainly the most sensible thing to do. We don't need this unmaintained piece of software to cause havoc on our projects.

The problem is still continuing... I propose to switch cross-wiki uploads off

That's certainly the most sensible thing to do. We don't need this unmaintained piece of software to cause havoc on our projects.

I agree. We get lots of complaints traced to it at Commons talk:Abuse filter.

SJu added a comment.Mar 19 2019, 10:10 PM

There seems to be no shift in the problem solution. Please help to correct, improve and internationalize a message template intended to inform uploaders of affected files.

LX renamed this task from Recently more broken files (premature end of JPEG file) that were cross-wiki uploaded to Commons by several users to Recently more broken files (premature end of file) that were cross-wiki uploaded to Commons by several users.May 3 2019, 8:59 PM
LX added a subscriber: LX.May 3 2019, 9:04 PM

This affects all types of files, not just JPEG files. There are now 5824 files in the subcategories of https://commons.wikimedia.org/wiki/Category:Incomplete_files_(5_MB_interruption).

You are probably not going to find the answer by looking at the code. More likely, it's a server configuration issue. 5 MB seems to be a popular limit for PHP configurations. With the cross-wiki uploads, there may be internal calls that get cut off at this limit. Please check your server logs.

Krinkle raised the priority of this task from High to Unbreak Now!.May 14 2019, 1:01 PM
Krinkle added a subscriber: Krinkle.

Over 6,000 files have been permanently damaged, lost without recovery, and without the user's being informed about this or aware during the upload.

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptMay 14 2019, 1:01 PM
Steinsplitter added a comment.EditedMay 14 2019, 1:51 PM

Over 6,000 files have been permanently damaged, lost without recovery, and without the user's being informed about this or aware during the upload.

A hotfix seems reasonable. Once fixed we can contact the users, asking them to re-upload the file(s) if possible.

Change 510428 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/core@master] Don't allow completing a partial stash upload

https://gerrit.wikimedia.org/r/510428

Change 510428 merged by jenkins-bot:
[mediawiki/core@master] Don't allow completing a partial stash upload

https://gerrit.wikimedia.org/r/510428

The patch that we think will fix the problem is on production since (I think) May 23. I don't see anything obvious in the category https://commons.wikimedia.org/wiki/Category:Incomplete_files_(5_MB_interruption) from later than that. So perhaps it's fixed? Not sure how to verify

PrimeHunter added a comment.EditedJun 4 2019, 3:54 PM

Search of exactly 5 MB uploads sorted by creation date descending. The most recent error is from 23 May. There were many daily errors before that so it looks like the issue is fixed.

Cparle added a comment.Jun 4 2019, 4:39 PM

Great. Ok to resolve the ticket?

greg added a comment.Jun 4 2019, 6:50 PM

Thanks both! (yes from my end, unless there is any needed follow-up, which should probably be in a separate task)

Jdforrester-WMF closed this task as Resolved.Jun 4 2019, 6:53 PM
Jdforrester-WMF assigned this task to matthiasmullie.