Page MenuHomePhabricator

Commons UploadWizard rejects all uploads with false error message after a certain number of uploads
Closed, ResolvedPublic

Description

Original bug report

I am uploading PDFs to Wikimedia Commons using UploadWizard. Most of them worked. But I have 27 files where I get the error: "This file might be corrupt, or have the wrong extension." They work just fine on my computer and I still get the error even after I deleted the files and re-downloaded them from my source to be safe.

Attached is one of the files that did not work. One of the files that did work: https://commons.wikimedia.org/wiki/File:NIOSH_Manual_of_Analytical_Methods_-_8304.pdf

Addendum

I am re-opening this bug now that I know it was not just a one-time fluke. Also applies to other sorts of files and not just PDFs.

This is what happens: I am currently working on uploading hundreds of files to Wikimedia Commons. This means I can do 50 at a time, but given that I am only uploading a few hundred files, this isn't that terrible. But in a given session, after a certain number of uploads (around 100), UploadWizard starts rejecting all submissions with "This file might be corrupt, or have the wrong extension." I know for a fact that the files are not corrupt; in fact the UploadWizard even renders them correctly.

Yesterday (14 April 2016) I encountered this error, left the office, re-attempted the uploads around 90 minutes later and they worked just fine. I worked on another batch of uploads and got the error again. Changing your IP address does not fix the problem, since I tried from a different WLAN and the same problem still occurred.

I did more uploads this morning (15 April 2016) and I managed to upload 150 or so files before encountering the error again. This appears to be a recurring problem, and I would like to see it investigated.

A commenter raised the possibility of a throttle; if such a throttle exists, it should be properly expressed instead of being hidden behind an inapplicable error message. If there is not supposed to be a throttle, the cause of the error message should be investigated.

Event Timeline

harej-NIOSH raised the priority of this task from to Needs Triage.
harej-NIOSH updated the task description. (Show Details)
harej-NIOSH added a project: UploadWizard.
harej-NIOSH added subscribers: harej-NIOSH, matmarex.
Restricted Application added a project: Multimedia. · View Herald TranscriptDec 18 2015, 6:28 PM
Restricted Application added subscribers: StudiesWorld, Steinsplitter, Aklapper. · View Herald Transcript

Incidentally, the file seems to have uploaded to Phabricator just fine!

Restricted Application added a subscriber: Matanya. · View Herald TranscriptDec 18 2015, 6:29 PM

It magically works now. You can close this if you'd like. Though it's worth looking into what causes a file to be rejected and subsequently accepted.

matmarex closed this task as Invalid.Dec 18 2015, 8:19 PM
matmarex claimed this task.
matmarex edited projects, added MediaWiki-Uploading; removed UploadWizard.
matmarex set Security to None.

Uploading that file also works for me locally, and I see nothing that could be non-deterministic in these checks (although I haven't checked it that carefully). I blame server pixies. Maybe something somewhere got throttled if you were uploading many PDF files at once.

harej-NIOSH renamed this task from UploadWizard rejecting valid PDFs to UploadWizard rejects all uploads with false error message after a certain number of uploads.Apr 15 2016, 2:10 PM
harej-NIOSH reopened this task as Open.
harej-NIOSH updated the task description. (Show Details)
matmarex triaged this task as High priority.Apr 15 2016, 3:42 PM

I tried to "stress test" the system and managed to get 114 uploads in before it started throwing the error message.

Note that the "this file might be corrupt" error can occur at any point in the process. Even if the upload succeeds on the initial upload screen it might still throw that error on the finalization screen.

Well, I uploaded around 1000 PDF files with UploadWizard to Commons in the last few hours and not a single of them failed. Perhaps this only kicks in if you publish some of the uploads, which I didn't do (only uploaded to stash).

The API call can return some more details than UploadWizard displays, but we don't log it anywhere (T130485). I'll try to look into that. Or maybe upload some PDFs to testwiki.

I found out that there is a throttle, sneakily implemented as an AbuseFilter filter (filter 140, which you indeed hit many times). UploadWizard is not very smart about reporting actions blocked by filters (T132866), and due to T87381 every UW upload counts towards the limit twice (once when uploading, and once when publishing the file description).

Non-autopatrolled users can upload at most 380 files per 4320 seconds (72 minutes). You should probably request that user right at Commons if you're planning any more big uploads (the error message that UW should be showing you, but doesn't, says so: https://commons.wikimedia.org/wiki/MediaWiki:Abusefilter-warning-ut). I have patroller rights on Commons and so didn't hit this when testing myself :/

I filed T132930 and T132920 about converting the filter to site configuration and making UploadWizard understand rate limits (with filters, even I make it display the message, it still won't know to stop uploading when it hits this error – and I think every failed upload attempt also counts towards the limit).

matmarex renamed this task from UploadWizard rejects all uploads with false error message after a certain number of uploads to Commons UploadWizard rejects all uploads with false error message after a certain number of uploads.Apr 27 2016, 6:32 PM
matmarex added a project: Commons.
Restricted Application added a subscriber: Poyekhali. · View Herald TranscriptApr 27 2016, 6:32 PM
matmarex closed this task as Resolved.May 11 2016, 4:36 PM

This should no longer be a problem, with both of the subtasks fixed. There is still a throttle, but you'll get a useful error message about it.