Increase chunked upload size limit to support longer videos
Closed, ResolvedPublic

Description

Per request from users, especially those in the GLAM community.


Version: wmf-deployment
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=36587

bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz52593.
greg created this task.Via LegacyAug 7 2013, 2:40 AM
Kelson added a comment.Via ConduitNov 1 2013, 8:16 PM

Chapters organize conferences with presentations, discussions and workshops which are time to time longer than 90 minutes. In that case, it's almost impossible to upload video recordings of these events on Commons. For this reason, chapters often use commercial platforms (Youtube/Vimeo/Dailymotion/...) instead of Commons to share their videos. The most courageous ones achieve to catch someone with a shell access on Commons, but this is always really not user firendly. What are the technical reasons to limit chunked uploads to 500 MB (and not 1GB for example)?

Eloquence added a comment.Via ConduitNov 1 2013, 10:30 PM

This is more of an ops question, so adding Mark, Faidon & Ken. Recap: Limit for chunked uploads (requires enabling an experimental feature in user preferences) is 500MB, without chunked uploads it's 100MB.

If we increased it from 500MB to 1GB while still keeping that feature obscure, it would probably have manageable impact. Mark/Faidon, can you give us a sense of whether this would be problematic given current storage capacity? Beyond total capacity, would an increase in the number of objects at 500MB-1GB size be a problem?

faidon added a comment.Via ConduitNov 4 2013, 12:09 PM

TL;DR: increasing the limit to at least 1GiB is fine from an ops perspective.

We're currently at 63.6T out of 96T (* 3 replicas * 2 for pmtpa/eqiad = 576T raw). Individual disks show at as much as 70% full. About 5.5T of these are temp data that haven't been pruned because of #56401 and friends, so we'll regain some capacity from there. The thumb RFC can potentially shave off as much as 15.5T of thumbs (perhaps some number in between, depending on the solution we'll end up choosing).

Even at the current trend, estimates place us at 75-80% (max of our comfort zone, to be able to provide redundancy + lead time to procure hardware) by April/May:
http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=Swift+pmtpa&h=Swift+pmtpa+prod&jr=&js=&v=63597099281113&m=swift_bytes_count&vl=total+bytes&trend=1

There are some ideas of increasing the capacity much earlier than that by moving pmtpa hardware to eqiad at the end of the year but nothing's decided yet. I can say with certainty that we're not going to keep 6 replicas of everything with the new datacenter but use Swift 1.9's georeplication features to lower this to, likely, 4.

Varnish's caches are much smaller, obviously, but they're LRU, so unless we have tons of very popular large files, it shouldn't affect them much.

Large files aren't a big deal with Swift or its underlying filesystem (XFS) -- at least up to (the default of) 5G; after that, we'd need to explore segmented files in Swift itself ("large object support"). Large files are actually *much* more efficient to handle that really small files (filesystem overheads etc.)

Now, a large number of large files could have the potential of throwing us off planning, especially if you account for a multiplification factor because of transcoding and us keeping in Swift multiple versions of the same video file in different formats & resolution.

However, I don't think it's even remotely plausible this would happen. All of our transcoded files account for a mere 1.1T. Additionally, the 21.251.977 objects in Commons (originals, does *not* include thumbs/transcoded) are distributed in size as follows:

0 bytes - 4.0KiB = 368841
4.0KiB - 8.0KiB = 275486
8.0KiB - 16.0KiB = 596394
16.0KiB - 32.0KiB = 972185
32.0KiB - 64.0KiB = 1528037
64.0KiB - 128.0KiB = 2466817
128.0KiB - 256.0KiB = 2294701
256.0KiB - 512.0KiB = 2247147
512.0KiB - 1.0MiB = 2453605
1.0MiB - 2.0MiB = 2746332
2.0MiB - 4.0MiB = 2931704
4.0MiB - 8.0MiB = 1832701
8.0MiB - 16.0MiB = 410738
16.0MiB - 32.0MiB = 88009
32.0MiB - 64.0MiB = 24599
64.0MiB - 128.0MiB = 13504
128.0MiB - 256.0MiB = 933
256.0MiB - 512.0MiB = 192
512.0MiB - 1.0GiB = 52

Files over 64MiB are a mere 0.06% of the total file count and account for under 2T in size in total. Files over 128MiB are less than one tenth of files between 64MiB-128MiB. I think it's safe to assume that files in the 512MiB-1.0GiB will stay well below a 1TiB limit in the mid-term, which is more than fine given our current media storage pool.

Finally, a factor that should be considered is the resources needed from the videoscaler (TMH) infrastructure. Jan Gerber is the expert here, but I don't think going to 1GiB is going to make any big difference. Maybe silly things such as cgroup limits would need to be adjusted but it's not a pressing matter anyway as the process is asynchronous and we can course-correct as we go forward.

gerritbot added a comment.Via ConduitNov 6 2013, 12:26 AM

Change 93900 had a related patch set uploaded by Eloquence:
Increase upload size limit for chunked and URL uploads to 1000MB.

https://gerrit.wikimedia.org/r/93900

Eloquence added a comment.Via ConduitNov 6 2013, 12:28 AM

Adding Jan per above in case there's anything to be done from the TMH perspective.

Kelson added a comment.Via ConduitNov 6 2013, 11:00 AM

This is a great move.
Thank you so much for this.

gerritbot added a comment.Via ConduitNov 7 2013, 6:52 PM

Change 93900 merged by jenkins-bot:
Increase upload size limit for chunked and URL uploads to 1000MB.

https://gerrit.wikimedia.org/r/93900

McZusatz added a comment.Via ConduitNov 9 2013, 11:02 AM

Change got merged. Thus marking as resolved.

Tbayer added a comment.Via ConduitNov 11 2013, 12:02 PM

This is great news. However, I just tried to upload an 814MB file ("WMF Monthly Metrics Meeting November 7, 2013.ogv") without success. After starting the upload, the following status messages appear:

'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error: "unknown"'

(using UploadWizard, chunked uploads enabled, tried in Chromium and Firefox. Just noting this here for the moment, might file a separate bug later)

Tbayer added a comment.Via ConduitNov 11 2013, 12:04 PM

(In reply to comment #9)

This is great news. However, I just tried to upload an 814MB file ("WMF
Monthly
Metrics Meeting November 7, 2013.ogv") without success. After starting the
upload, the following status messages appear:

'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
"unknown"'

(using UploadWizard, chunked uploads enabled, tried in Chromium and Firefox.
Just noting this here for the moment, might file a separate bug later)

PS: Clicking 'Retry failed uploads' results in 'Unknown error: "stasherror"'.

Bawolff added a comment.Via ConduitNov 11 2013, 3:18 PM

(In reply to comment #9)

This is great news. However, I just tried to upload an 814MB file ("WMF
Monthly
Metrics Meeting November 7, 2013.ogv") without success. After starting the
upload, the following status messages appear:

'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
"unknown"'

(using UploadWizard, chunked uploads enabled, tried in Chromium and Firefox.
Just noting this here for the moment, might file a separate bug later)

There are various reports about stashed upload being unreliable, and that unreliability increasing with number of chunks. See bug 3658

Bawolff added a comment.Via ConduitNov 11 2013, 3:48 PM

(In reply to comment #11)

(In reply to comment #9)
> This is great news. However, I just tried to upload an 814MB file ("WMF
> Monthly
> Metrics Meeting November 7, 2013.ogv") without success. After starting the
> upload, the following status messages appear:
>
> 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> "unknown"'
>
> (using UploadWizard, chunked uploads enabled, tried in Chromium and Firefox.
> Just noting this here for the moment, might file a separate bug later)

There are various reports about stashed upload being unreliable, and that
unreliability increasing with number of chunks. See bug 3658

I mean bug 36587

Fastily added a comment.Via ConduitNov 12 2013, 2:20 AM

Chunked uploads (using 4mb chunks) are no better over API. The server consistently returns a 500 error when trying to upload 120mb files. I find that this tends to be closely correlated with upload speed. For example, uploading a 120mb file at 50mbps (using a corporate network) completely fails about 70% (7/10) of the time, whereas a uploading at 2mbps (using a typical home network) fails 100% (10/10) of the time.

Rillke added a comment.Via ConduitNov 18 2013, 11:21 PM

(In reply to comment #9)

'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
"unknown"'

Does the file end up in [[Special:UploadStash]] after some time?

Fastily added a comment.Via ConduitNov 18 2013, 11:27 PM

(In reply to comment #14)

(In reply to comment #9)
> 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> "unknown"'

Does the file end up in [[Special:UploadStash]] after some time?

I didn't know about that special page :o I'm going to check it out asap. Thanks for sharing!

Tbayer added a comment.Via ConduitNov 18 2013, 11:31 PM

(In reply to comment #14)

(In reply to comment #9)
> 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> "unknown"'

Does the file end up in [[Special:UploadStash]] after some time?

https://commons.wikimedia.org/wiki/Special:UploadStash currently tells me "You have no stashed files". I didn't check earlier (the error occurred on November 11).

Fastily added a comment.Via ConduitNov 19 2013, 2:03 AM

(In reply to comment #14)

(In reply to comment #9)
> 'Uploading...' --> 'Queued...' (for several minutes) --> 'Unknown error:
> "unknown"'

Does the file end up in [[Special:UploadStash]] after some time?

I did a few test uploads, and it looks like the failed uploads do end up in [[Special:UploadStash]], but I'm unable to download & verify the contents of those files because the system "Cannot serve a file larger than 1048576 bytes." Given this, it's hard to say what kind of issue this is (e.g. maybe the uploaded file is corrupt, i.e. file was not assembled properly server-side?)

Add Comment