Chunked upload fails with internal_api_error_UploadStashFileNotFoundException
OpenPublic

Older changes are hidden. Show older changes.
bzimport added a subscriber: Unknown Object (MLST).Via ConduitNov 22 2014, 12:27 AM
bzimport set Reference to bz36587.
Nemo_bis created this task.Via LegacyMay 7 2012, 10:20 AM
Nemo_bis added a comment.Via ConduitMay 7 2012, 10:29 AM

Ah, and if I retry it says "completed!" after a few seconds, but it actually fails again giving "Unknown error: "internal_api_error_UploadStashFileNotFoundException".

Confirmed also by russavia with a 268 MB file on Firefox, Windows (because on Chrome didn't work at all).

Nemo_bis added a comment.Via ConduitMay 7 2012, 10:36 AM

Sorry, russavia had Unknown error: "internal_api_error_UploadChunkFileException" at first.

Eloquence added a comment.Via ConduitMay 9 2012, 2:30 AM

At which step of the upload process is this occurring?

Eloquence added a comment.Via ConduitMay 9 2012, 6:30 AM

We've reproduced this as a first step error. For very large uploads (~100MB), the final chunk API POST request sometimes (not always) fails with a 504 error from Squid. It seems likely that we have a timing issue with the chunk re-assembly.

Note that this is distinct from bug 34785, which has similar symptoms but occurs in the last step of the upload and is independent of upload size.

bzimport added a comment.Via ConduitMay 10 2012, 9:14 AM

jgerber wrote:

could it also be that re-assembly + hashing of large files just takes to long and hits the php execution time limit?

Eloquence added a comment.Via ConduitMay 25 2012, 2:48 AM

I can confirm that this still happens for some very large uploads (just tried a 400MB file), even after we disabled client-side API timeouts. So this looks like a server-side timeout issue in the chunk re-assembly step as Jan suggests.

Fastily added a comment.Via ConduitJul 29 2012, 8:49 PM

Can something be done about this? I'm getting similar timeouts when I try using the API with a Java application.

Eloquence added a comment.Via ConduitAug 1 2012, 1:25 AM

Hi Fastily,

we're currently in the process of moving to a new media storage backend (Swift), which involves lots of changes on all levels (dev and ops), and is the reason we've not prioritized a fix for this yet (we're changing some of the relevant infrastructure, and the people with the right skills to fix this bug are working on the migration).

We may not have cycles to fully debug the issues with chunking and chunk assembly before September, but Rob should be able to give a better estimate soon (unless someone on CC beats us to it and actually does find time to get to the root of the issue).

Eloquence added a comment.Via ConduitAug 29 2012, 10:09 PM

OK, now that we're through most of the Swift migration, we should pick this one up again.

This still occurs and is easily reproducible by uploading a 300-400MB file to Commons via Upload Wizard with chunked uploading enabled.

Looking at the API responses in details, what happens is that there's a final chunk API request which results in a "Wikimedia Error" webpage response like this:

Request: POST http://commons.wikimedia.org/w/api.php, from 208.80.154.134 via cp1002.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.125 (10.64.0.125)<br/>
Error: ERR_READ_TIMEOUT, errno [No Error] at Wed, 29 Aug 2012 21:46:36 GMT

Upload Wizard then seems to attempt to re-upload the same chunk again, with the following response code:

{"servedby":"mw65","error":{"code":"internal_api_error_UploadChunkFileException","info":"Exception Caught: error storing file in '\/tmp\/phpRWwfF6': backend-fail-alreadyexists; mwstore:\/\/local-swift\/local-temp\/5\/57\/10tlhjb3zs7o.4nm7i9.28.ogx.490","*":""}

This response code is then surfaced through the UI.

So it looks like the chunk re-assembly for large files is still timing out somewhere.

aaron added a comment.Via ConduitAug 30 2012, 10:31 PM

Since concatenation is rare, it doesn't show up usefully in profiling.

I've made a few optimizations:
https://gerrit.wikimedia.org/r/#/c/22063/
https://gerrit.wikimedia.org/r/#/c/22118/

...but I'm not sure how much faster the file operations can be without parallel downloading of local file copies. The slowness may not even be coming from here, I'd need more data to say.

In any case, rather than having the JS expecting to have the whole assembly/upload happen synchronously with the last chunk, it might help if the JS could fallback to polling the server for completion status. Unfortunately this would require a job queue since you can't really give a reply, close the connection, and keep doing work in PHP.

Fastily added a comment.Via ConduitAug 31 2012, 1:15 AM

This isn't exclusively an UploadWizard issue; I still get the same timeout errors when performing the chunked upload of a 400mb file via API.

RobLa-WMF added a comment.Via ConduitAug 31 2012, 6:40 PM

The changes marked above should roll out Wednesday, September 5 with the 1.20wmf11 deployment.

Eloquence added a comment.Via ConduitSep 8 2012, 11:43 PM

It looks like chunked uploading is completely broken now, perhaps due to these changes; see bug 40048.

Eloquence added a comment.Via ConduitSep 10 2012, 8:05 AM

Basic chunked uploading is fixed now, thanks Aaron.

Large chunk uploads still fail. The last chunk still leads to a Squid timeout error:

Request: POST http://commons.wikimedia.org/w/api.php, from 208.80.154.134 via cp1015.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.125 (10.64.0.125)<br/>
Error: ERR_READ_TIMEOUT, errno [No Error] at Mon, 10 Sep 2012 07:38:26 GMT

Upload Wizard then tries again, and it now fails with a different API error:

{"servedby":"mw67","error":{"code":"stashfailed","info":"Could not read file \"mwstore&#58;\/\/local-swift\/local-temp\/8\/86\/10ukfdjtth84.rb8zv4.28.ogx.0\"."}}

Eloquence added a comment.Via ConduitSep 24 2012, 11:25 PM

Aaron/Rob - This issue is still occurring; as soon as there are no higher priority Swift / storage issues remaining, it would be nice if we could dig into it some more.

Fastily added a comment.Via ConduitOct 2 2012, 12:10 AM

Any updates? I'm still getting the same error :(

aaron added a comment.Via ConduitOct 16 2012, 10:32 PM

I tried to upload a 423mb (clone of AW_PT_2010_-_Sérgio_Nunes_-_Uso_da_Wikipédia_para_investigação_em_informática.ogv) file but UW always fails with "Internal error: Something went wrong with processing your upload on the wiki." on one the first chunks...so I can't even hit concatenate there.

Upload ~150mb files seems to work fine though.

bzimport added a comment.Via ConduitOct 18 2012, 11:09 AM

jgerber wrote:

how do you test large uploads on commons? I get a 100Mb upload size limit.

tomasz added a comment.Via ConduitOct 18 2012, 11:13 AM

You need to enable chunked uploads in your Preferences, Jan, and then try to upload the file using UploadWizard.

bzimport added a comment.Via ConduitOct 18 2012, 3:33 PM

jgerber wrote:

Aaron, is it possible that your failed upload is related to a non ascii filename? Have you tried with an ascii only filename and large filesize?

Uploading 430mb file here fails with:

{"servedby":"srv297",
"error":{

"code":"stashfailed",
"info":"Could not read file

\"mwstore&#58;\/\/local-swift\/local-temp\/8\/8c\/10xtcqa7qx6c.c4odgl.1731370.ogx.0\"."
}
}

thats 'backend-fail-read', so it fails in doConcatenate in
includes/filebackend/FileBackendStore.php

mwstore:// looks like a virtualSource, so its the second loop, can it be that tmp files are cleaned up between checking them out in the first loop and the second? why is this done in 2 loops?

file ends in 0 so its the first chunk that is missing. $wgUploadStashMaxAge is 6 hours so its unlikely that they get collected at this stage. Any other cleanup things happening that could be the issue?

bzimport added a comment.Via ConduitOct 18 2012, 3:36 PM

jgerber wrote:

followup on the name issue, renaming a file to Wikipédia_para_investigação_em_informática.ogv and trying to upload it, I get this error as first response:

{
"servedby":"mw73",
"error":{"code":"internal-error","info":"Invalid file title supplied"}
}

aaron added a comment.Via ConduitOct 18 2012, 4:20 PM

(In reply to comment #21)

Aaron, is it possible that your failed upload is related to a non ascii
filename? Have you tried with an ascii only filename and large filesize?

Uploading 430mb file here fails with:

So this is what happens *with* an ascii name I assume?

Aklapper added a comment.Via ConduitOct 18 2012, 8:38 PM

Are bug 35354 and bug 40048 duplicates?

Fastily added a comment.Via ConduitOct 18 2012, 8:47 PM

(In reply to comment #24)

Are bug 35354 and bug 40048 duplicates?

Not quite. 40048 fixed chunked uploading for files slightly exceeding 100mb. Chunked uploads still outright fails for files exceeding >250Mb

aaron added a comment.Via ConduitOct 18 2012, 11:10 PM

I tried renaming the 430mb file and reuploading. I got "Internal error: Server failed to store temporary file.". I also not sure why the name makes a difference since it's not really used until I try to publish (if I managed to get that far).

bzimport added a comment.Via ConduitOct 19 2012, 12:04 PM

jgerber wrote:

  1. Filenames: the upload request requires a filename to be passed, the filename needs to be unique and valid, https://gerrit.wikimedia.org/r/#/c/28673/ makes sure its not sending any special characters.
  1. Chunk uploads, I changed the chunk size locally (in javascript) to 50Mb but I still get the same error, so the problem is not related to number of chunks. Still get:

{"servedby":"srv254","error":{"code":"stashfailed","info":"Could not read file \"mwstore&#58;\/\/local-swift\/local-temp\/d\/d6\/10xw8uke3gt4.kgux90.1731370.ogx.0\"."}}

  1. Using swift on a local vm this problem does not exist. Large uploads do not cause any errors. Dont have a local multiwrite setup, could that be the problem here?
aaron added a comment.Via ConduitOct 19 2012, 5:26 PM

(In reply to comment #27)

  1. Using swift on a local vm this problem does not exist. Large uploads do not cause any errors. Dont have a local multiwrite setup, could that be the problem here?

I was using ceph rgw. I've tried a 90mb file again and it uploads ~10 chunks and then dies with "Exception Caught: path doesn't exist" (using firebug for inspecting errors). Some of the problems may have to do with http://tracker.newdream.net/issues/3365. But that doesn't quite explain why I it uploads several chunks fine before failing yet.

I can upload 40mb-150mb or so files with swift. Sometimes it says it failed when it succeeded (and I can still publish it) and other times it just works normally.

I'm also running into a bug where most of the messages appear like "[mwe-upwiz-subhead-message][mwe-upwiz-subhead-translate]" on the wikis using sqlite, which is probably unrelated, but annoying problem. This affects my wikis that use ceph and swift.

I also have a wiki using the local FS and MySQL, which seems to give me much less trouble. I'll try MySQL + Swift and see what that does next.

aaron added a comment.Via ConduitOct 19 2012, 7:59 PM

Even when it succeeds with swift, I still see dangling HTTP requests that last seemingly forever in firebug. https://gerrit.wikimedia.org/r/#/c/28286/ might help, but I'm not sure were the hanging is internally.

aaron added a comment.Via ConduitOct 23 2012, 5:19 PM

Fixed for ceph made in https://gerrit.wikimedia.org/r/#/c/29421/. It works on the level of swift now.

Prolineserver added a comment.Via ConduitNov 10 2012, 1:25 PM

Any updates here? I still get the "internal_api_error_UploadStashFileNotFoundException" :(

Aklapper added a comment.Via ConduitNov 19 2012, 1:21 PM

Can users somehow debug this or provide more info, if it's reproducible for them?

Yet another report in the feedback forum: https://commons.wikimedia.org/w/index.php?title=Commons:Upload_Wizard_feedback&oldid=83274547#Upload_error

Nemo_bis added a comment.Via ConduitNov 19 2012, 1:25 PM

(In reply to comment #32)

Can users somehow debug this or provide more info, if it's reproducible for
them?

I doubt so: it's hard to debug even for Aaron! ;-)
What users can see is that, whatever the upload method is, big chunked uploads are likely to fail.

aaron added a comment.Via ConduitNov 19 2012, 7:47 PM

More improvements in https://gerrit.wikimedia.org/r/#/c/33978/ (which will still be slow).

This also got worse last week with multiwriting of temp files to nas1 in addition to swift...

More radical changes proposed in https://gerrit.wikimedia.org/r/#/c/34062/ to eliminate slow HTTP requests entirely.

Some debug timing from test2wiki:
2012-11-18 09:30:20 srv293 test2wiki: Finished concat of 242 chunks in 33.490297794342 sec.
2012-11-18 09:31:07 srv293 test2wiki: Finished stash of 242 chunked file in 47.260545969009 sec.

2012-11-18 10:39:27 srv295 test2wiki: Finished concat of 403 chunks in 50.59726691246 sec.
2012-11-18 10:40:42 srv295 test2wiki: Finished stash of 403 chunked file in 75.486596107483 sec.

...this is not counting the massive and slow extra GET request eliminated in https://gerrit.wikimedia.org/r/#/c/33978/.

Aklapper added a comment.Via ConduitDec 17 2012, 4:05 PM

(In reply to comment #34)

More radical changes proposed in https://gerrit.wikimedia.org/r/#/c/34062/ to
eliminate slow HTTP requests entirely.

That code change was merged 10 days ago and deployed on December 12th.
I'm curious if the situation has improved. Comments on this report are welcome by people that were previously affected!
Plus we probably have to watch https://commons.wikimedia.org/wiki/Commons:Upload_Wizard_feedback and see...

bzimport added a comment.Via ConduitDec 17 2012, 5:02 PM

jgerber wrote:

note that the change in UploadWizard was only merged December 15 and is not deployed so far. so just testing UW right now will give the same result.

https://gerrit.wikimedia.org/r/#/c/34537/

In addition there is another change in gerrit that is not merged yet preventing timeouts in the upload-from-stash step:

core: https://gerrit.wikimedia.org/r/#/c/36697/
UW: https://gerrit.wikimedia.org/r/#/c/36768/

tomasz added a comment.Via ConduitJan 5 2013, 11:34 PM

Note that this bug is still in place, I am getting the very same error message as Nemo_bis for files above 150 MiB; I just tried uploading a 170 MiB video file without any success.

Can we please get this fixed? It's so annoying that even though there is the raised 500 MiB limit for files uploaded with the UploadWizard, one cannot actually make any use of it due to this bug.

Nemo_bis added a comment.Via ConduitJan 8 2013, 5:47 PM

After I8cfcb09d , some of us have been able to upload a big file for the first time: thanks!
In particular we uploaded two videos of 170 and 200 MB via UploadWizard:
https://commons.wikimedia.org/wiki/File:2013-01-05_President_Obama's_Weekly_Address.ogv
https://commons.wikimedia.org/wiki/File:Communication_issues_musings_of_a_dinosaur.ogv

Apart from bug 36599, the only problem I had was that it didn't manage to produce the thumbnail (nor in first step nor later) and that the final publishing phase took way longer than usual.
I still have to try with bigger files.

Nemo_bis added a comment.Via ConduitJan 8 2013, 10:01 PM

I also managed to upload a 500 MB video: https://commons.wikimedia.org/wiki/File:Meet_John_Doe.ogv

I encountered bug 43746, then got "An unknown error occurred" as said there, then upon "retry failed uploads" a api-error-internal_api_error_UploadStashFileNotFoundException error message, but the file was actually uploaded.
(I discovered only now that I tried to reupload it and got "A file with this name exists already" in "Describe" step.)

aaron added a comment.Via ConduitJan 10 2013, 1:26 AM

(In reply to comment #36)

note that the change in UploadWizard was only merged December 15 and is not
deployed so far. so just testing UW right now will give the same result.

https://gerrit.wikimedia.org/r/#/c/34537/

In addition there is another change in gerrit that is not merged yet
preventing
timeouts in the upload-from-stash step:

core: https://gerrit.wikimedia.org/r/#/c/36697/
UW: https://gerrit.wikimedia.org/r/#/c/36768/

Merged and deployed now.

Eloquence added a comment.Via ConduitJan 10 2013, 2:18 AM

I just tried a 470MB test file and it failed with "Unknown error:unknown" on first step. I suspect it ended up hitting line 217 in mw.FormDataTransport.js:

//If concatenation takes longer than 3 minutes give up

if ( ( ( new Date() ).getTime() - _this.firstPoll ) > 3 * 60 * 1000 ) {
    _this.transportedCb({
        code: 'server-error',
        info: 'unknown server error'
    });

Is three minutes sufficient time? Are there other things we should do to speed up the concatenation step?

aaron added a comment.Via ConduitJan 10 2013, 3:20 AM

(In reply to comment #41)

Is three minutes sufficient time?

Not really, it should be increased, say to 5 (and more as needed, though I at some point it will get kind of unreasonable without more ui feedback).

Are there other things we should do to

speed
up the concatenation step?

Increasing the chunk size would help somewhat. Perhaps pipelining the chunks would help (though the db layout does not support that, they must come in order). Disabling multiwrite would speed up chunks storage and final file stashing by 1.5X or so.

Nemo_bis added a comment.Via ConduitJan 15 2013, 11:20 PM

To both me (Chromium) and odder (?), upload of https://archive.org/download/Plan_9_from_Outer_Space_1959/Plan_9_from_Outer_Space_1959.ogv (372 MiB) is failing in "Upload" step with «Unknown error: "unknown".»

Eloquence added a comment.Via ConduitJan 16 2013, 5:18 AM

Gah, same here. :-( It's now aborting immediately at the first API request that returns the "queued" result. (Chrome 22.)

bzimport added a comment.Via ConduitJan 16 2013, 5:24 AM

jgerber wrote:

if you run it with the console open whats the last response from the server in the network tab?

Eloquence added a comment.Via ConduitJan 16 2013, 5:33 AM

Second try, it fails as before with the last API requests all returning result:Poll,stage:queued (including the final response), until Upload Wizard reports the "unknown" error, presumably due to triggering the aforementioned timeout. This is with a 125MB file.

Not sure it really behaved differently before, will do some more testing.

Eloquence added a comment.Via ConduitJan 16 2013, 6:17 AM

Whatever is going wrong in the assembly stage, it doesn't look like the slowdown is linear. With a 22MB file the assembly succeeds almost instantaneously after the first API poll. With a 30MB file, it's two poll requests. With a 125MB file, I have more than 50 polls before it finally times out.

Fastily added a comment.Via ConduitFeb 3 2013, 11:16 PM

Any updates?

Eloquence added a comment.Via ConduitFeb 6 2013, 1:43 AM

The last uploads I tried all succeeded. Could others following this please try again and see if you can successfully upload >100MB files through Upload Wizard with the chunked upload preference enabled?

Nemo_bis added a comment.Via ConduitFeb 6 2013, 5:12 PM

Your file was 120 MB, I've tried a 370 MiB video and it failed again (I'm now retrying). Too bad, because it seemed also fast enough, averaging around 300-400 KiB/s and oscillating in 100-800 interval.

Eloquence added a comment.Via ConduitFeb 6 2013, 6:19 PM

Trying with a 344M file I get the good old Unknown error: "internal_api_error_UploadStashFileNotFoundException" again. Note that it doesn't appear to be doing the asynchronous polling any more -- the final chunk is uploaded and fails with an error 504 - gateway timeout response.

It looks like increasing chunk size to 5MB may have helped somewhat but not sufficiently for very large files.

Eloquence added a comment.Via ConduitFeb 6 2013, 8:36 PM

We (Jan/RobLa/Aaron/myself) connected about this earlier today. It looks like part of the problem is preserving the request context (user/IP) in a sane manner when shelling out for asynchronous assembly of the chunks / uploading the file from stash. Jan wants to take a first crack at resolving this w/ Aaron's help. In addition the server-side thumbnail generation for Ogg files currently doesn't scale for large files and needs to be re-implemented using range requests. (Jump in if I got any of that wrong.)

Hopefully we can make some further progress on this in the next couple of weeks.

bzimport added a comment.Via ConduitFeb 23 2013, 10:15 PM

M8R-udfkkf wrote:

I'm getting this error roughly once ever several thousand files (~10-20MB) that are being chunk-uploaded via commons API in 2MB-3MB chunks:

{"servedby":"mw1138","error":{"code":"internal_api_error_UploadChunkFileException","info":"Exception Caught: error storing file in '\/tmp\/php2BDowP': backend-fail-internal; local-swift","*":""}}

Looks like something isn't being allocated/locked properly possibly a rare race condition. It's annoying.

Eloquence added a comment.Via ConduitFeb 26 2013, 5:41 PM

Am I right that this is mainly waiting for this changeset to be merged, or are there other dependencies at this point?

https://gerrit.wikimedia.org/r/#/c/48940/

Nischayn22 added a comment.Via ConduitMar 2 2013, 8:14 PM

Changeset merged, is this fixed now?

Nemo_bis added a comment.Via ConduitMar 3 2013, 10:53 PM

(In reply to comment #39)

then upon "retry failed uploads" a
api-error-internal_api_error_UploadStashFileNotFoundException error message,
but the file was actually uploaded.

This happened again with http://commons.wikimedia.org/wiki/File:Scrooge_1935.ogv uploaded by Beria (300 MB in 12 min).

Eloquence added a comment.Via ConduitMar 12 2013, 1:41 AM

I'm having mixed success with the latest code. A 459M file seemed to work fine (I didn't go past stage 1). A 491M file I just tried resulted in the following API request sequence:

5MB chunk->ok
5MB chunk->ok
5MB chunk->ok
...
lots of chunks later
...
~500K (final) chunk->Error 504
Retry of ~500K final chunk->API error.

The final API error was:

{"servedby":"mw1194","error":{"code":"stashfailed","info":"Invalid chunk offset"}}

Surfaced to the user as "Internal error: Server failed to store temporary file".

aaron added a comment.Via ConduitMar 15 2013, 10:01 PM

(In reply to comment #57)

I'm having mixed success with the latest code. A 459M file seemed to work
fine
(I didn't go past stage 1). A 491M file I just tried resulted in the
following
API request sequence:

5MB chunk->ok
5MB chunk->ok
5MB chunk->ok
...
lots of chunks later
...
~500K (final) chunk->Error 504
Retry of ~500K final chunk->API error.

The final API error was:

{"servedby":"mw1194","error":{"code":"stashfailed","info":"Invalid chunk
offset"}}

Surfaced to the user as "Internal error: Server failed to store temporary
file".

No async upload was enabled at that time (it is behind a feature flag). Since all wikis were on wmf11, I deployed the new redis queue aggregator on Thursday, which worked fine. Async uploads were enabled again then. The existing high priority loop made via puppet config changes was already done and appears to work as desired. The new code to fix the IP logging issue was broken by CentralAuth, which, that caused upload job to fail. This was fixed in https://gerrit.wikimedia.org/r/#/c/54084/. It can be tested at test2wiki (jobs on testwiki are broken due to srv193 being in pmtpa, so don't use that).

aaron added a comment.Via ConduitMar 25 2013, 6:27 PM

(In reply to comment #58)

No async upload was enabled at that time (it is behind a feature flag).

Obviously meant "disabled".

Tbayer added a comment.Via ConduitJun 7 2013, 3:50 PM

I don't know whether it was caused by the exact same API error as in this bug, but I just got the above UploadWizard error message when trying to upload a 231MB file (twice, on Chromium and Firefox):

"Internal error: Server failed to store temporary file."

Clicking "Retry failed uploads" in Chromium resulted in "Unknown error: 'unknown'", but on Firefox it succeeded in completing the upload.

Fastily added a comment.Via ConduitJun 8 2013, 10:00 PM

(In reply to comment #60)

I don't know whether it was caused by the exact same API error as in this
bug,
but I just got the above UploadWizard error message when trying to upload a
231MB file (twice, on Chromium and Firefox):

"Internal error: Server failed to store temporary file."

Clicking "Retry failed uploads" in Chromium resulted in "Unknown error:
'unknown'", but on Firefox it succeeded in completing the upload.

Confirmed, this is totally broken again. Why is this broken again?

Aklapper added a comment.Via ConduitJun 9 2013, 12:25 AM

Fastily: If you can confirm it, providing some basic info would be very welcome (file size, browser, etc.). Thanks!

Fastily added a comment.Via ConduitJun 17 2013, 10:46 PM

Certainly. Every few big uploads, I get a generic HTTP 500 error. Also, I'm not sure if it's related, but I also get the occasional error in which the server claims it can't reassemble the chunks. Neither of these errors really occur when I'm editing on a corporate network with some 60+ mbps upload speeds, but when I'm at home, I get an average of 5mbps upload. That said, I suspect something is timing out server-side.

I used a variety of test files, ranging from 152-450 Mb, using my Java library to upload the files via the MediaWiki API.

Bawolff added a comment.Via ConduitJul 8 2013, 5:18 AM

When I tested on test2.wikipedia.org - I was able to upload a small file fine. However a large (200 mb range, don't remember the exact size) file split into about 400 chunks ended up with me just getting result: poll; stage: queued forever and ever (Well actually I gave up after about 2 and a half hours of waiting).

I get a generic HTTP 500 error

Just for reference, wikimedia's 500 errors usually contain debugging information near the bottom (unless they've changed).

Fastily added a comment.Via ConduitJul 12 2013, 10:33 PM

Is anything being done to resolve this issue at the moment?

I'd suggest debugging with a 350mb file but throttling upload speed to ~0.5mbps. Each time I did this it failed without exception.

Bawolff added a comment.Via ConduitJul 14 2013, 4:07 AM

I used a variety of test files, ranging from 152-450 Mb, using my Java
library
to upload the files via the MediaWiki API.

Is your java library using the async option when uploading these files?

What stage does the 500 error usually occur at? (While uploading a chunk, Some point during the "assembling" stage or some point during the "publish" stage? Or does it vary).

Fastily added a comment.Via ConduitJul 22 2013, 8:37 PM

(In reply to comment #66)

>
> I used a variety of test files, ranging from 152-450 Mb, using my Java
> library
> to upload the files via the MediaWiki API.

Is your java library using the async option when uploading these files?

What stage does the 500 error usually occur at? (While uploading a chunk,
Some
point during the "assembling" stage or some point during the "publish" stage?
Or does it vary).

I believe we are using the async option when uploading.

The 500 error typically occurs at the publishing stage. I've had similar, but infrequent 500 errors at the assembling stage as well, but I'm not sure how related this is.

Kelson added a comment.Via ConduitNov 11 2013, 3:58 PM

Now, that we have increased the UploadWizard limit to 1GB, the frequency of this error will probably increase. Last report I have read is about two consecutive uploads of a 800 MB video (with FF and chrome) which both failed with a "stasherror": https://bugzilla.wikimedia.org/show_bug.cgi?id=52593#c9

Fastily added a comment.Via ConduitNov 12 2013, 2:22 AM

I do hope this is fixed soon. I commented about it here: https://bugzilla.wikimedia.org/show_bug.cgi?id=52593#c13

Fastily added a comment.Via ConduitNov 19 2013, 2:05 AM

New update -- It looks like big files which 'failed to upload' are visible at [[Special:UploadStash]]. I'm unable to download & verify the contents of
those files however, because the system "Cannot serve a file larger than 1048576 bytes."
Given this, it's hard to say what kind of issue this is (e.g. maybe the
uploaded file is corrupt, i.e. file was not assembled properly server-side?)

Bawolff added a comment.Via ConduitNov 19 2013, 2:50 AM

(In reply to comment #70)

New update -- It looks like big files which 'failed to upload' are visible at
[[Special:UploadStash]]. I'm unable to download & verify the contents of
those files however, because the system "Cannot serve a file larger than
1048576 bytes."

Given this, it's hard to say what kind of issue this is (e.g. maybe the

uploaded file is corrupt, i.e. file was not assembled properly server-side?)

Yes, we currently don't let people download things that are in the upload "stash" if they are bigger than 1 mb. If it is of interest, the reason given in the code for this is:

// Since we are directly writing the file to STDOUT,
// we should not be reading in really big files and serving them out.
//
// We also don't want people using this as a file drop, even if they
// share credentials.
//
// This service is really for thumbnails and other such previews while
// uploading.

You should be able to verify if the upload worked by requesting a thumbnail that would be smaller than 1 mb. If it was a jpeg file, with a stash name of 11oedl0sn7e4.aggjsr.1.jpg , then a url of Special:UploadStash/thumb/11oedl0sn7e4.aggjsr.1.jpg/120px-11oedl0sn7e4.aggjsr.1.jpg should work. If its a video file named 11oedl0sn7e4.aggjsr.1.webm, then Special:UploadStash/thumb/11oedl0sn7e4.aggjsr.1.webm/100px--11oedl0sn7e4.aggjsr.1.webm.jpg would get you a thumbnail if the file is not corrupt (I think, haven't tested that for a video)

Given this, it's hard to say what kind of issue this is (e.g. maybe the
uploaded file is corrupt, i.e. file was not assembled properly server-side?)

I wonder if some sort of timeout/race condition happened with the screwy way we store data in the session, and maybe the file is uploaded fine, but the publish step (i.e. The step moving file from stash to actually on-wiki) never really happened due to timeout. If that was the case, it may be possible to do a further API request after the fact to finish the upload.

Bawolff added a comment.Via ConduitNov 19 2013, 5:21 AM

>Given this, it's hard to say what kind of issue this is (e.g. maybe the
>uploaded file is corrupt, i.e. file was not assembled properly server-side?)

I wonder if some sort of timeout/race condition happened with the screwy way
we
store data in the session, and maybe the file is uploaded fine, but the
publish
step (i.e. The step moving file from stash to actually on-wiki) never really
happened due to timeout. If that was the case, it may be possible to do a
further API request after the fact to finish the upload.

Meh, looks like the individual chunks get listed to, so hard to tell what that means.

Also, looks like the thumbnailing infrastructure around stashed upload is totally broken on wmf wikis. Presumably it was forgotten about in the swift migration(?) Not that surprising, since I'm not sure if anyone


Because Special:Upload is kind of useless... I made some (very hacky) js that will add some additional links. It adds a (broken) link to a thumbnail. It adds a link to metadata, and it adds a publish link, to take a file out of the stash and on to the wiki.

In particularly, the metadata link includes the file size in bytes, which you can use to verify that all the parts of the file made it. If you want to be more paranoid, it also returns an SHA1 sum of the file, so you can be sure its really the right file on the server.

If that matches up, try the publish link and see what happens...

Anyhow, to sum up, add
importScript( 'User:Bawolff/stash.js' );
to [[commons:Special:MyPage/common.js]], and you should have the extra link on [[commons:Special:UploadStash]] which you can use to verify what file is in the stash.

Rillke added a comment.Via ConduitNov 20 2013, 1:13 AM

You should be able to verify if the upload worked by requesting a thumbnail
that would be smaller than 1 mb. If it was a jpeg file, with a stash name of
[...]

or you simply try

https://commons.wikimedia.org/wiki/Special:UploadStash?withJS=MediaWiki:EnhancedStash.js

Rillke added a comment.Via ConduitNov 20 2013, 1:28 AM

(In reply to comment #72)

importScript( 'User:Bawolff/stash.js' );

Ha! Didn't notice that. Ever wanted to write something like that and now we have 2 of them.

(In reply to comment #71)

I think, haven't tested that for a video

Video works *but* the generated "thumbnail" (for me https://commons.wikimedia.org/wiki/Special:UploadStash/thumb/11vmdxqgjy9o.2239xy.1173692.webm/120px--11vmdxqgjy9o.2239xy.1173692.webm.jpg) is in full video size (here 1920x1080px).

Bawolff added a comment.Via ConduitNov 20 2013, 2:31 AM

(In reply to comment #74)

(In reply to comment #72)
> importScript( 'User:Bawolff/stash.js' );

Ha! Didn't notice that. Ever wanted to write something like that and now we
have 2 of them.

Cool. Yours is about a billion times better than my hack.

(In reply to comment #71)
> I think, haven't tested that for a video

Video works *but* the generated "thumbnail" (for me
https://commons.wikimedia.org/wiki/Special:UploadStash/thumb/11vmdxqgjy9o.
2239xy.1173692.webm/120px--11vmdxqgjy9o.2239xy.1173692.webm.jpg)
is in full video size (here 1920x1080px).

Interesting. When I tried I was getting squid 503 errors all over the place (both for videos and normal images)

Gilles added a comment.Via ConduitJan 10 2014, 2:16 PM

(In reply to comment #67)

(In reply to comment #66)
> >
> > I used a variety of test files, ranging from 152-450 Mb, using my Java
> > library
> > to upload the files via the MediaWiki API.
>
> Is your java library using the async option when uploading these files?
>
> What stage does the 500 error usually occur at? (While uploading a chunk,
> Some
> point during the "assembling" stage or some point during the "publish" stage?
> Or does it vary).

I believe we are using the async option when uploading.

The 500 error typically occurs at the publishing stage. I've had similar,
but
infrequent 500 errors at the assembling stage as well, but I'm not sure how
related this is.

I'd like to clarify this a bit. Your main issue is a 500 that happens at the publishing stage, is that correct? I think that this ticket has actually talked about several different bugs over time, which makes things more confusing than they need to be. I'd like to treat the assembly stage errors separately, I'm more interested in the one that's causing you issues the most frequently.

Are there any more specific errors in the header or body of the 500 response?

Bawolff added a comment.Via ConduitJan 10 2014, 7:03 PM

As an aside, splitting comment 64 to bug 59917

greg added a comment.Via ConduitJun 24 2014, 5:23 PM

Gilles: I'm resetting assignee for now. Should the priority be lowered as well (there hasn't been any movement/communication (either direction) since January)?

Fastily, do you have a reply for Gille's question below?

Gilles: you asked(In reply to Gilles Dubuc from comment #76)

(In reply to comment #67)
> (In reply to comment #66)
> > >
> > > I used a variety of test files, ranging from 152-450 Mb, using my Java
> > > library
> > > to upload the files via the MediaWiki API.
> >
> > Is your java library using the async option when uploading these files?
> >
> > What stage does the 500 error usually occur at? (While uploading a chunk,
> > Some
> > point during the "assembling" stage or some point during the "publish" stage?
> > Or does it vary).
>
> I believe we are using the async option when uploading.
>
> The 500 error typically occurs at the publishing stage. I've had similar,
> but
> infrequent 500 errors at the assembling stage as well, but I'm not sure how
> related this is.

I'd like to clarify this a bit. Your main issue is a 500 that happens at the
publishing stage, is that correct? I think that this ticket has actually
talked about several different bugs over time, which makes things more
confusing than they need to be. I'd like to treat the assembly stage errors
separately, I'm more interested in the one that's causing you issues the
most frequently.

Are there any more specific errors in the header or body of the 500 response?

Pristurus added a comment.Via ConduitJul 10 2014, 7:50 AM

Using the bigChunkedUpload.js to upload a new version (345.142KB) of https://commons.wikimedia.org/wiki/File:Clusiodes_-_2014-07-06kl.AVI.webm I got the message " FAILED: {"servedby":"mw1190","error":{"code":"stasherror","info":"UploadStashFileNotFoundException: key '12fc8few9krk.lhij8x.957461.webm' not found in stash"}}. This error occurred after uploading 82 of 85 chunks. I have to use a 384 kbit/s connection, so bigger uploads need several hours.

Aklapper added a comment.Via ConduitJul 10 2014, 12:17 PM

Whatever "bigChunkedUpload.js" is, this bug report is about UploadWizard instead...

Rillke added a comment.Via ConduitJul 10 2014, 12:47 PM

(In reply to Andre Klapper from comment #80)

Whatever "bigChunkedUpload.js" is, this bug report is about UploadWizard
instead...

This bug is about an issue with chunked uploading and thus belongs to either Wikimedia or MediaWiki file management.

bigChunkedUpload.js is a standard-compliant script written by me and the error message is what it got back by the API.

Rillke added a comment.Via ConduitJul 10 2014, 1:28 PM

(In reply to Sisa from comment #79)
Sisa, do you remember

  1. how long it took uploading the 82 chunks
  2. when you were attempting to upload (date+time+timezone or just in UTC)

Did your re-try?

Pristurus added a comment.Via ConduitJul 10 2014, 2:03 PM

(In reply to Rainer Rillke @commons.wikimedia from comment #82)

(In reply to Sisa from comment #79)
Sisa, do you remember

  1. how long it took uploading the 82 chunks
  2. when you were attempting to upload (date+time+timezone or just in UTC)

    Did your re-try?

Sorry, I can not answer your questions exactly. I started the upload yesterday at (about) 17h here in Germany (UTC+2) and I went to bed at about 1.30h today in the morning. As far as I remember about 70 chunks were uploaded (without any error) at this time. I will try it again next night...

Pristurus added a comment.Via ConduitJul 11 2014, 11:08 AM

(In reply to Rainer Rillke @commons.wikimedia from comment #82)

Also the retry ended up unsuccessfully. I started it at 0.43h (UTC+2) and all chunks were uploaded (87 of 87, chunk size: 4096 KiB, duration: 36558s). However the server sided rebuilding of the new file is hanging ("44552: finalize/87> Still waiting for server to rebuild uploaded file" and so on...)

Rillke added a comment.Via ConduitJul 11 2014, 11:25 AM

(In reply to Sisa from comment #84)
If this particular file matters for you, you can try publishing it from your upload stash, if it's still in: https://commons.wikimedia.org/w/index.php?title=Special:UploadStash&withJS=MediaWiki:EnhancedStash.js


I notice that files are removed from stash quite frequently now ... could this cause any harm?

Pristurus added a comment.Via ConduitJul 11 2014, 12:50 PM

How can I do this? Openimg the file in a new tab of my browser (SeaMonkey 2.26.1) brings the message "Internal Server Error Cannot serve a file larger than 1048576 bytes."

Rillke added a comment.Via ConduitJul 11 2014, 12:53 PM

(In reply to Sisa from comment #86)
Is there a "publish" button? Try that. If it doesn't let you use the desired destination file name, we can move it later to where it should go.

Gilles added a project: Multimedia.Via WebNov 24 2014, 3:17 PM
Gilles moved this task to Current cycle on the Multimedia workboard.Via WebNov 24 2014, 3:55 PM
Gilles moved this task to Backlog on the Multimedia workboard.Via WebApr 6 2015, 9:23 AM
Fastily removed a subscriber: Fastily.Via WebMay 21 2015, 4:08 AM

Add Comment