Page MenuHomePhabricator

http-bad-status - 403 Forbidden when trying to upload file from tools.wikimedia.pl by URL to Commons
Closed, ResolvedPublic

Description

I just tried to upload

http://tools.wikimedia.pl/~odder/whitehouse/41d5129de542285b6c62.webm

using https://commons.wikimedia.org/wiki/Special:Upload as well as via API

(I have "upload_by_url" userright enabled).

Request URL:

https://commons.wikimedia.org/w/api.php?action=upload&format=json

POST data from API sandbox:

filename=Test%20test.webm&comment=Test%20bug%2072897&url=http%3A%2F%2Ftools.wikimedia.pl%2F~odder%2Fwhitehouse%2F41d5129de542285b6c62.webm&token=d27ea41fb633146835d1a2d1e962f37e545f9bd9%2B%5C

Result:

{

"servedby": "mw1139",
"error": {
    "code": "http-bad-status",
    "info": "Error fetching file from remote source",
    "0": "403",
    "1": "Forbidden",
    "*": "See https://commons.wikimedia.org/w/api.php for API usage"
}

}

Log entry on the tools.wikimedia.pl server:

tools.wikimedia.pl 208.80.xxx.yyy - - [09/Nov/2014:18:03:30 +0100] "GET /~odder/whitehouse/41d5129de542285b6c62.webm HTTP/1.1" 200 932365538 "-" "MediaWiki/1.25wmf6"

so MediaWiki gets 200 from the origin server


Version: 1.25-git
Severity: normal
URL: https://commons.wikimedia.org/w/api.php?action=upload&format=json

Details

Reference
bz73200

Event Timeline

bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz73200.
bzimport added a subscriber: Unknown Object (MLST).
saper created this task.Nov 9 2014, 5:05 PM

Maybe check to see if it works with a small test file, just to rule out the size of the file triggering something (Yeah, I know, 403 would be the wrong error code for file too big, but just in case).

I'm idly wondering whether you're setting a User-Agent and whether you can perform other actions such as edits.

From the url-downloader config:

reply_body_max_size 534773760 allow all

Looks like upload by url has a max file size of 510 mb. (Also I feel like even without that, things might time out before a 900 mb video got uploaded)

(In reply to MZMcBride from comment #2)

Ignore me! I mis-read the bug report and Bawolff set me straight.

(In reply to Bawolff (Brian Wolff) from comment #3)

From the url-downloader config:

reply_body_max_size 534773760 allow all

Looks like upload by url has a max file size of 510 mb. (Also I feel like
even without that, things might time out before a 900 mb video got uploaded)

https://git.wikimedia.org/blob/operations%2Fpuppet.git/df4b24132abb856323e69a0decf0ebcde65b67fa/modules%2Furl_downloader%2Ftemplates%2Fsquid.conf.erb#L81

Yeah, there are a few references to 510 MB in there.

Change 172120 had a related patch set uploaded by Brian Wolff:
Increase max file size of url downloader proxy to 1010mb

https://gerrit.wikimedia.org/r/172120

Change 172120 merged by Faidon Liambotis:
Increase max file size of url downloader proxy to 1010mb

https://gerrit.wikimedia.org/r/172120

hoo added a comment.Nov 10 2014, 10:41 AM

Puppet patch has been merged.

Tgr added a comment.Nov 10 2014, 12:48 PM

If the problem was indeed the file being too large, the API should have a less misleading way of reporting it, so I don't think this is fully fixed.

Marcin, can you verify whether the upload works now?

saper added a comment.Nov 10 2014, 5:29 PM

Right now I get "HTTP request timed out.", so we are one stage ahead. It didn't wait very long before timeout, few seconds maybe.

In reply to Tisza Gergő from comment #9)

If the problem was indeed the file being too large, the API should have a
less misleading way of reporting it, so I don't think this is fully fixed.

Its a proxy squid server in between the api and the rest of the internet. I dont think mw has any way of knowing it hit a file too big instead of legit 403.

(In reply to Marcin Cieślak from comment #10)

Right now I get "HTTP request timed out.", so we are one stage ahead. It
didn't wait very long before timeout, few seconds maybe.

I think the timeout is currently 30 seconds(?) Im not surprised you hit it. You would have to have some pretty good conectivity to transfer a 900mb file in 30 seconds.

Change 172437 had a related patch set uploaded by saper:
More debug diagnostics for upload by URL

https://gerrit.wikimedia.org/r/172437

The only place where squid (or the target website) tell us more details is the body of the error response.

gerrit change 172437 adds remote server output to the debug log to help troubleshoot such issues.

Tgr added a comment.Nov 12 2014, 10:43 AM

(In reply to Bawolff (Brian Wolff) from comment #11)

Its a proxy squid server in between the api and the rest of the internet. I
dont think mw has any way of knowing it hit a file too big instead of legit

We are talking about a squid operated by WM-PL, right? AFAIK wikimedia.pl is not WMF-hosted.

That would make this a bug in the squid configuration of WM-PL, presumably. Is there a tracker where this could be upstreamed then?

I think the timeout is currently 30 seconds(?) Im not surprised you hit it.
You would have to have some pretty good connectivity to transfer a 900mb file
in 30 seconds.

The deeper issue is, then, that we don't have a method for upload-by-url that's not limited by the timeout (unlike normal uploads, where we have the chunked upload API)?

(In reply to Tisza Gergő from comment #14)

(In reply to Bawolff (Brian Wolff) from comment #11)
> Its a proxy squid server in between the api and the rest of the internet. I
> dont think mw has any way of knowing it hit a file too big instead of legit
> 403.

We are talking about a squid operated by WM-PL, right? AFAIK wikimedia.pl is
not WMF-hosted.

No, we are talking about url-downloader.wikimedia.org proxy. I believe the proxy is used as a security measure (not 100% sure against what, but isolating user trigerrable http downloads is probably just a goodvidea in general)

The deeper issue is, then, that we don't have a method for upload-by-url
that's not limited by the timeout (unlike normal uploads, where we have the
chunked upload API)?

Yes. We used to have an async mode, but its disabled and possibly has issues (not sure what exactly). We may be able to ease issue by increasing various timeouts or removing the restriction against using tool labs (which afaik has a really fast link to production, being in same data centre and all), but ultimately to truly fix this we would need to reserurecte async mode (otoh i have no idea how that would tie in with current UI. But current ui sucks a lot so i suppose it wouldnt matter much)

Steinsplitter set Security to None.
Steinsplitter moved this task from Incoming to Uploading on the Commons board.

what's the status on this btw? did the upload eventually succeeded ?

saper added a comment.Mar 31 2015, 9:41 PM

Last time I tried was when I commented on this bug - it just timed out probably due to transfer time required.

I don't even have the original video now.

thanks @saper, I'm not sure there's anything else from operations @Tgr ? perhaps if download-by-url times out the regular upload can be suggested to the user or sth like that

Steinsplitter changed the task status from Open to Stalled.EditedApr 2 2015, 11:50 AM
Steinsplitter added a subscriber: Steinsplitter.

I tested uploading:

from pl toolserver. Works fine.

The orginal file (http://tools.wikimedia.pl/~odder/whitehouse/41d5129de542285b6c62.webm) has been deleted. Can't reproduce.

tomasz added a subscriber: tomasz.Jun 5 2015, 9:04 PM

The orginal file (http://tools.wikimedia.pl/~odder/whitehouse/41d5129de542285b6c62.webm) has been deleted. Can't reproduce.

Sorry about this, guys; I wasn't aware that a video that I transcoded was being used to test a bug.

I have absolutely no idea what it was (the file name suggests it might have been a Barack Obama speech), and cannot find it right now, but I can upload a new one of around the same size for you to test on if that helps.

saper closed this task as Resolved.Sep 1 2015, 4:18 AM
Restricted Application added a subscriber: Matanya. · View Herald TranscriptSep 1 2015, 4:18 AM
saper added a comment.Sep 1 2015, 4:22 AM

Timeout issues (if any) belong to the different bug I think.

An attempt to upload https://archive.org/download/HansMoserUmEineNasenlaenge/Um_eine_Nasenlaenge_1949_Hans_Moser.ogv (356843452 bytes) results in:

{"servedby":"mw1136","error":{"code":"http-timed-out",
   "info":"Error fetching file from remote source",
   "0":"http://tools.wikimedia.pl/~saper/Um_eine_Nasenlaenge_1949_Hans_Moser.ogv",
   "*":"See https://commons.wikimedia.org/w/api.php for API usage"}}