Page MenuHomePhabricator

IA-Upload: retry failed commons uploads (504 errors)
Open, Needs TriagePublicBUG REPORT

Description

Sometimes, Commons chokes on a file during upload from IA-Upload:

[2021-07-08T21:44:24.126200+00:00] LOG.INFO: Uploading to /var/www/tool/jobqueue/lajangadagoog/lajangadagoog.djvu to Commons La Jangada (1882).djvu [] []
[2021-07-08T21:44:24.132088+00:00] LOG.DEBUG: Getting fresh token {"type":"csrf"} []
[2021-07-08T21:47:21.400361+00:00] LOG.CRITICAL: Server error: `POST https://commons.wikimedia.org/w/api.php` resulted in a `504 Gateway Timeout` response: upstream request timeout  [] []

However, these uploads can still succeed sometimes, so it might be worth to retry the upload before declaring the job failed. Or provide a "retry upload" button on the log entry so the user can try it later on.

Event Timeline

I think not exactly - retrying is more of a hail-mary mitigation than a solution to the actual problem.

So this would be adding a single immediate retry? That does sound simpler than adding async support to chunked uploading (in Addwiki).

I think so. I'm unclear how often that would be successful, but it probably wouldn't hurt?

Async chunked upload in Addwiki is still not a complete solution because you can still run into Commons failures during stash publish - I failed to help someone at COM:VPT upload very large TIFFs due to that. That said, I haven't had a single failure on a DjVu or PDF since PWB learned async chunked uploading.

Also, a combination of T287241 (allow direct URL upload from the tool) and T286702 (provide access to the description string) would allow users to circumvent the problem if it failed again, since they could then do their own retries.

Why not just use direct URL upload to begin with? Let Commons pull it from IA Upload then we do not have to worry about teaching addwiki async chunked uploading as IA Upload's part would be downloading instead (and from the perspective of IA Upload, another request is inherently asynchronous). This has the added benefit of transparency as we would have to provide the media URL and file description metadata anyway.

It seems the default IA Upload method to Commons ought to be for Commons to download from IA Upload. This should solve issues with long failed uploads and retrying would be as simple as reinitiating the URL-based upload.

TheresNoTime changed the subtype of this task from "Task" to "Bug Report".Aug 4 2022, 6:14 PM