Page MenuHomePhabricator

Pywikibot should support async chunked uploading
Open, MediumPublic

Description

Using async chunked uploading may be a possible workaround for T128358: Uploading 1.2GB ogv results in 503, but Site.upload does not yet support so.

MwJSBot.js (used by bigChunkedUpload.js) has an implementation of this and is known to succeed on larger files than pywikibot can achieve (T128591#2085330 succeeded on 1.7GB file).

Event Timeline

jayvdb added a subscriber: XZise.Mar 9 2016, 12:38 AM
zhuyifei1999 triaged this task as High priority.

Change 277060 had a related patch set uploaded (by Zhuyifei1999):
[WIP] site: Support async uploads

https://gerrit.wikimedia.org/r/277060

Change 277060 abandoned by Zhuyifei1999:
[WIP] site: Support async uploads

Reason:
Uploading needs an overhaul (will do later). This logic will create too much duplicated code

https://gerrit.wikimedia.org/r/277060

zhuyifei1999 lowered the priority of this task from High to Medium.Apr 16 2016, 7:41 AM

Change 277060 restored by Zhuyifei1999:
[WIP] site: Support async uploads

Reason:
I'm working on this again

https://gerrit.wikimedia.org/r/277060

Change 277060 had a related patch set uploaded (by Zhuyifei1999):
site: Support async chunked uploads (T129216)

https://gerrit.wikimedia.org/r/277060

Fae awarded a token.May 15 2017, 4:05 PM
Fae added a subscriber: Fae.
Jeff_G added a subscriber: Jeff_G.
xSavitar moved this task from Backlog to Needs Review on the Pywikibot board.Nov 5 2018, 11:32 AM

In attempting to upload to Wikimedia Commons using chunked upload, I receive a whole series of warnings and errors, including
(1) "WARNING: Unexpected offset.", (2) a large traceback from a read timeout (T253236), (3) series of internal_api_error_DBQueryError and retries, and then finally (4) stashfailed error that fails the upload attempt.

Is this all related to the async issue, or is there something else going on as well? I have tried to upload about 100 large PDFs using Pywikibot with chunked upload on, and every one failed in this way (with varying amounts of retries). Assuming that this is the cause, it seems like all large file upload via Pywikibot is blocked, unless anyone knows a different way.

As this task seems stalled, it would be great if someone else could give it a look.

1WARNING: Unexpected offset.
2ERROR: An error occurred for uri https://commons.wikimedia.org/w/api.php
3ERROR: Traceback (most recent call last):
4 File "/srv/paws/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
5 six.raise_from(e, None)
6 File "<string>", line 2, in raise_from
7 File "/srv/paws/lib/python3.6/site-packages/urllib3/connectionpool.py", line 383, in _make_request
8 httplib_response = conn.getresponse()
9 File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse
10 response.begin()
11 File "/usr/lib/python3.6/http/client.py", line 297, in begin
12 version, status, reason = self._read_status()
13 File "/usr/lib/python3.6/http/client.py", line 258, in _read_status
14 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
15 File "/usr/lib/python3.6/socket.py", line 586, in readinto
16 return self._sock.recv_into(b)
17 File "/usr/lib/python3.6/ssl.py", line 1012, in recv_into
18 return self.read(nbytes, buffer)
19 File "/usr/lib/python3.6/ssl.py", line 874, in read
20 return self._sslobj.read(len, buffer)
21 File "/usr/lib/python3.6/ssl.py", line 631, in read
22 v = self._sslobj.read(len, buffer)
23socket.timeout: The read operation timed out
24
25During handling of the above exception, another exception occurred:
26
27Traceback (most recent call last):
28 File "/srv/paws/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
29 timeout=timeout
30 File "/srv/paws/lib/python3.6/site-packages/urllib3/connectionpool.py", line 641, in urlopen
31 _stacktrace=sys.exc_info()[2])
32 File "/srv/paws/lib/python3.6/site-packages/urllib3/util/retry.py", line 368, in increment
33 raise six.reraise(type(error), error, _stacktrace)
34 File "/srv/paws/lib/python3.6/site-packages/urllib3/packages/six.py", line 686, in reraise
35 raise value
36 File "/srv/paws/lib/python3.6/site-packages/urllib3/connectionpool.py", line 603, in urlopen
37 chunked=chunked)
38 File "/srv/paws/lib/python3.6/site-packages/urllib3/connectionpool.py", line 389, in _make_request
39 self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
40 File "/srv/paws/lib/python3.6/site-packages/urllib3/connectionpool.py", line 307, in _raise_timeout
41 raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
42urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='commons.wikimedia.org', port=443): Read timed out. (read timeout=45)
43
44During handling of the above exception, another exception occurred:
45
46Traceback (most recent call last):
47 File "/srv/paws/pwb/pywikibot/data/api.py", line 1732, in _http_request
48 body=body, headers=headers)
49 File "/srv/paws/pwb/pywikibot/tools/__init__.py", line 1797, in wrapper
50 return obj(*__args, **__kw)
51 File "/srv/paws/pwb/pywikibot/comms/http.py", line 315, in request
52 r = fetch(baseuri, method, params, body, headers, **kwargs)
53 File "/srv/paws/pwb/pywikibot/comms/http.py", line 519, in fetch
54 error_handling_callback(request)
55 File "/srv/paws/pwb/pywikibot/comms/http.py", line 404, in error_handling_callback
56 raise request.data
57 File "/srv/paws/pwb/pywikibot/comms/http.py", line 382, in _http_process
58 **http_request.kwargs)
59 File "/srv/paws/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
60 resp = self.send(prep, **send_kwargs)
61 File "/srv/paws/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
62 r = adapter.send(request, **kwargs)
63 File "/srv/paws/lib/python3.6/site-packages/requests/adapters.py", line 529, in send
64 raise ReadTimeout(e, request=request)
65requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='commons.wikimedia.org', port=443): Read timed out. (read timeout=45)
66
67WARNING: Waiting 5 seconds before retrying.
68WARNING: API error internal_api_error_DBQueryError: [e5c4842c-a45a-4833-bdc4-98e16aec1b83] Caught exception of type Wikimedia\Rdbms\DBQueryError
69ERROR: Detected MediaWiki API exception internal_api_error_DBQueryError: [e5c4842c-a45a-4833-bdc4-98e16aec1b83] Caught exception of type Wikimedia\Rdbms\DBQueryError
70[errorclass: Wikimedia\Rdbms\DBQueryError;
71 servedby: mw1282]; retrying
72WARNING: Waiting 10 seconds before retrying.
73WARNING: API error internal_api_error_DBQueryError: [5ec5e5bf-6348-4d70-a064-05c5c39841b5] Caught exception of type Wikimedia\Rdbms\DBQueryError
74ERROR: Detected MediaWiki API exception internal_api_error_DBQueryError: [5ec5e5bf-6348-4d70-a064-05c5c39841b5] Caught exception of type Wikimedia\Rdbms\DBQueryError
75[errorclass: Wikimedia\Rdbms\DBQueryError;
76 servedby: mw1360]; retrying
77WARNING: Waiting 20 seconds before retrying.
78WARNING: API error internal_api_error_DBQueryError: [2f59bd72-a7ee-4a3a-af93-49ea565264a3] Caught exception of type Wikimedia\Rdbms\DBQueryError
79ERROR: Detected MediaWiki API exception internal_api_error_DBQueryError: [2f59bd72-a7ee-4a3a-af93-49ea565264a3] Caught exception of type Wikimedia\Rdbms\DBQueryError
80[errorclass: Wikimedia\Rdbms\DBQueryError;
81 servedby: mw1362]; retrying
82WARNING: Waiting 40 seconds before retrying.
83WARNING: API error internal_api_error_DBQueryError: [b0fb2efd-e878-4984-8236-79e4947e41c0] Caught exception of type Wikimedia\Rdbms\DBQueryError
84ERROR: Detected MediaWiki API exception internal_api_error_DBQueryError: [b0fb2efd-e878-4984-8236-79e4947e41c0] Caught exception of type Wikimedia\Rdbms\DBQueryError
85[errorclass: Wikimedia\Rdbms\DBQueryError;
86 servedby: mw1363]; retrying
87WARNING: Waiting 80 seconds before retrying.
88WARNING: API error stashfailed: Could not read file "mwstore://local-swift-eqiad/local-temp/b/ba/17hxez96zikk.tgvq6.8609812.pdf.0".