I also discussed this with @tstarling recently, if a single long-running job doesn't work, we could split it into multiple jobs using range requests similar to how chunked uploads work.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T118887 Upload by URL doesn't work well for large files: HTTP request timed out. | |||
Open | None | T295007 Upload by URL should use the job queue, possibly chunked with range requests |
Event Timeline
We intend to try to take a stab at this during next week's MediaWiki CodeJam.
It will if anything be a chance for me to refresh my mediawiki internals knowledge and for others to mock my ability to interact with a web frontend :P
Change 982757 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):
[mediawiki/core@master] Add job for upload from UploadFromUrl.
Change 983196 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):
[mediawiki/core@master] Allow async upload by url via the Api
Change #1007344 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):
[mediawiki/core@master] Switch Special:Upload to use async upload-by-url
Hi, sorry for not replying earlier but this work is kind-of a side hustle for me - as you might have noticed, this stuff is far from my area of expertise :)
I have completed the basic work to make this work both in Special:Upload and in the API. While I think we can merge the patches to add asynchronous behaviour to the API as soon as I find a reviewer, the Special:Upload stuff will need me to find someone to help me polishing the UI part of it.
Doing frontend web development is really both not my job nor my area of expertise.
With the API change, you should be able to upload by url large files (up to 4 GB IIRC) without incurring in timeouts; and that will also allow us to move file processing to Shellbox, making our infrastructure more secure.
Here is an example of a script that works with the async api: P58902
Would this be enough to unblock people who want to upload large files, while I try to polish the Special:Upload patch?
@Joe: How about 5 GiB per https://gerrit.wikimedia.org/r/1002813 ? See https://phabricator.wikimedia.org/T191804#9363066 for a discussion
about the capacity implications of this change.
Once we've gone completely async and we've confirmed it works, it might be a good idea to increase the allowed size for files that are uploaded by URL.
As-is, the synchronous process doesn't allow you to upload files larger than 2-3GB because it often times out before a larger file is processed. So even getting to the current official limit of 4 GB and make it work reliably seems like a good first step.
Change #982757 merged by jenkins-bot:
[mediawiki/core@master] Add job for upload from UploadFromUrl
Change #983196 merged by jenkins-bot:
[mediawiki/core@master] Allow async upload by url via the Api
Change #1007344 merged by jenkins-bot:
[mediawiki/core@master] Switch Special:Upload to use async upload-by-url