Page MenuHomePhabricator

Improve download speed from archive.org on appservers
Open, Needs TriagePublic

Description

I tried again to upload a 491 MB TIFF file from https://archive.org/details/clevelandart-1916.1044-gardener-s-house-at and it failed
Direct link: https://archive.org/download/clevelandart-1916.1044-gardener-s-house-at/1916.1044_full.tif
(upload-by-url on Commons)

Request from 90.112.25.87 via cp3064 cp3064, Varnish XID 452004418
Error: 503, Backend fetch failed at Sun, 31 Oct 2021 20:15:12 GMT

MediaWiki has a 180 second (3 minutes) timeout that it has to download the file in. It took me ~4 minutes to download the file at home, and AFAICT the bandwidth limitation is not on my side. On a production appserver it's even slower, currently saying it's going to take 15 minutes (I wonder if they're rate limiting us?) - see P17649. Now, even if the file does download in 180 seconds, there's a 200 second overall MediaWiki timeout. So it would need to upload to Swift, extract metadata, and update the databases in 20 seconds, which is cutting it pretty close. Unfortunately there's no chunked-upload-by-url system I'm aware of, in the meantime I think the best solution is to download the file locally or to Toolforge and then chunked upload to Commons.

in the meantime I think the best solution is to download the file locally or to Toolforge and then chunked upload to Commons.

https://wikisource-bot.toolforge.org/ssu_request/1916.1044_full.tif

@Yann that should download much faster.

The IA downloads are incredibly slow for me: sometimes it an hour to get a sub-1GB ZIP.

There are other similar reports: T286976: HTTP 503 error trying to replace file via URL on Commons with a 330MB file, T280048: Uploading ~160MB DjVu by URL results in 503 error at Commons and failed upload, which I believe have a root cause in that downloading files from archive.org on appservers are slow.


We should see if it's possible to get better speeds from archive.org. Some people have already reached out via back channels.

Event Timeline

Slow bandwith from IA seems indeed the issue. I expected that upload-by-url (i.e. direct transfer from IA servers to WM servers) would just trump any limit to and from a personal connection. This is weird, as uploading to IA is quite fast. This issue defeats the whole point of upload-by-url. Thanks for taking care of that. Downloading to a local PC and then uploading to WM servers can only be a temporary workaround.

I managed to upload https://commons.wikimedia.org/wiki/File:Milton_-_Paradise_Lost,_1699.pdf (178.34 MB), so I thought this was fixed. But now I got this error again while uploading https://archive.org/download/ParadiseRegained1680_465/ParadiseRegained1680.pdf (67.6 MB).

Request from 90.112.34.87 via cp6012 cp6012, Varnish XID 422429028
Error: 503, Backend fetch failed at Sun, 21 May 2023 20:17:24 GMT