Page MenuHomePhabricator

Special:UploadWizard should have option for uploading from a URL like Special:Upload
Open, LowPublicFeature

Description

What's the problem ?

On commons there are 2 upload pages, Special:Upload which supports the uploading of file from a URL, and Special:UploadWizard which (currently) does not.

Using Special:Upload to load a file from a URL which is larger than 100MB repeatedly fails.

I was advised Special:UploadWzard undetook different approaches to support upload of larger files compard to Special:Upload, However Special:UploadWzard does not currently have an option to upload directly from a URL. (In one use case this would be a URL to a PDF file at Internet Archive or Hathi Trust.). This means having to download the file from the origin site to local storage, and then reupload to Commons using the aforementioned page. This is wasteful of bandwidth on the user side.

What's the functionality you would like?

Option in Special:UploadWizard to provide a URL to the appropriate file, rather than a local file, and to have that file load reliably, potentially by a splitting the transfer into parts, if the file is greater than 50MB, with additional integrity checks being made to ensure it uploads completely.

Event Timeline

Restricted Application added a project: Internet-Archive. · View Herald TranscriptJun 14 2020, 7:02 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Adding UploadWizard as this seems to be about UploadWizard.

Reedy added a subscriber: Reedy.Jun 14 2020, 2:06 PM

I don't think this necessarily works like you think it will.

I was advised Special:UploadWzard undetook different approaches to support upload of larger files by means of 'chunked uploads' which can also be done using a manually installed script.

Chunked upload is for doing uploads in "chunks" (parts) from the local machine, using javascript to the MW API.

This doesn't work for the upload by url, which is a direct request from Wikimedia servers to download the file, done server side, not from the client side.

If that is not working, it's probably hitting a timeout, because the source is too slow or the file is just too big (though, 100MB it shouldn't be too big).

So making UploadWizard fire off the same API request isn't going to make it work any better, and unless it downloads it locally first , chunked upload isn't going to work either.

Have tasks been filed for the upload by url failing? Because the proper fix is either to make upload by url multi streamed, or do it as some sort of background job that won't suffer from timeout (or increasing the timeout).

I have not filed a specific phabricator ticket for the Special:Upload upload by URL failing (in respect of large uploads), as I was told this was a known issue already.

Can you clarify further what you mean by 'multi-streamed' , Do you mean something like the approach some FTP clients use, by having multiple connections?

Reedy added a comment.Jun 14 2020, 6:47 PM

Can you clarify further what you mean by 'multi-streamed' , Do you mean something like the approach some FTP clients use, by having multiple connections?

Yes, basically. HTTP downloads can do it too; download managers have been doing it for decades. I don't know what exactly ends up being done in the MW abstractions when it hits the underlying curl/guzzle library, and as such, how the file is grabbed etc

I have not filed a specific phabricator ticket for the Special:Upload upload by URL failing (in respect of large uploads), as I was told this was a known issue already.

As always, cross linking tickets helps

ShakespeareFan00 added a comment.EditedJun 14 2020, 6:49 PM

I wasn't sure what tickets had been filed regarding the large file upload issues from Special:Upload..

T254459, T255238 may also need to be solved .

MarkTraceur triaged this task as Low priority.Jun 22 2020, 4:18 PM
MarkTraceur added a subscriber: MarkTraceur.

Marking as low priority given a need for more clarity around the request and the maintenance status of the extension right now.