Page MenuHomePhabricator

error with httplib2 and unicode_literals: UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128)
Closed, DeclinedPublic

Description

On an upload of "File:Scanwings, OH-BCX, Beechcraft C90 King Air (17512831479).jpg" to Commons:

Using HEAD, upload fail with a loophole of

ERROR: Traceback (most recent call last):
  File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1555, in submit
    body=body, headers=headers)
  File "/data/project/yifeibot/pywikibot/pywikibot/tools/__init__.py", line 950, in wrapper
    return obj(*__args, **__kw)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 269, in request
    r = fetch(baseuri, method, body, headers, **kwargs)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 370, in fetch
    error_handling_callback(request)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 287, in error_handling_callback
    raise request.data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128)

WARNING: Waiting 5 seconds before retrying.

After debugging, I found all versions after 1e54a7d6886d56a21101900025038e25bab5ad03 has the same error, while version before raises an (expected) exception that is handled in botscript:

KeyError: u'bad-prefix'

(raised in https://github.com/wikimedia/mediawiki/blob/84899f991ff44f36b7359f0bbf8de2955b81a904/includes/upload/UploadBase.php#L1834)

I agree that some errors should be prevented from uploading. But why is there a UnicodeDecodeError leading to a loophole?

Event Timeline

zhuyifei1999 raised the priority of this task from to Needs Triage.
zhuyifei1999 updated the task description. (Show Details)
zhuyifei1999 added subscribers: zhuyifei1999, XZise.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptMay 16 2015, 2:39 PM

Unfortunately HEAD is not very specific. The two most current versions (I've checked yet) have different line numbers. What does python pwb.py version tell?

EDIT: Great just checked the history one commit further and it's as of 1e54a7d6. Seems sensible.

Okay what command did you use to get the exception? And is it possible to run it in Python 3 maybe? The traceback itself isn't really helpful but afaics at some point in the HTTP layer there was the UnicodeDecodeError and I don't think it's related to the MediaWiki server itself. The problem is that I can't see where the error originally came from and at least usually does Python 3 provide you all exceptions that lead to an exception so this might help.

Python 3.4.3 (default, Mar 10 2015, 13:05:12) 
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> try:
...   int('a')
... except Exception as e:
...   ex = e
... 
>>> raise ex
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in <module>
ValueError: invalid literal for int() with base 10: 'a'
Python 2.7.8 (default, Aug 28 2014, 16:47:22) 
[GCC 4.8.3 20140624 (Red Hat 4.8.3-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> try:
...   int('a')
... except Exception as e:
...   ex = e
... 
>>> raise ex
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'a'

The command is site.upload(imagePage, source_filename=path, comment=self.summary, ignore_warnings=ignore)
where imagePage is a pywikibot.Page,
path is an ascii string,
self.summary is an ascii string
ignore is False

A KeyboardInterrupt during the wait shows:

WARNING: Waiting 40 seconds before retrying.
^CTraceback (most recent call last):
  File "flickrAZ.py", line 194, in <module>
    main()
  File "flickrAZ.py", line 190, in main
    bot.run()
  File "flickrAZ.py", line 178, in run
    self.parsePages()
  File "flickrAZ.py", line 72, in parsePages
    self.parsePhoto(photo)
  File "flickrAZ.py", line 133, in parsePhoto
    self.upload(imagePage, fileURL, name, False)
  File "flickrAZ.py", line 153, in upload
    site.upload(imagePage, source_filename=path, comment=self.summary, ignore_warnings=ignore)
  File "/data/project/yifeibot/pywikibot/pywikibot/tools/__init__.py", line 950, in wrapper
    return obj(*__args, **__kw)
  File "/data/project/yifeibot/pywikibot/pywikibot/site.py", line 4969, in upload
    result = req.submit()
  File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1579, in submit
    self.wait()
  File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1710, in wait
    time.sleep(self.retry_wait)
KeyboardInterrupt

As for Python 3, it gives "ImportError: No module named 'urllib2'". I'm not sure if I can run it

EDIT: urllib2 issue fixed, but the flickr library I am using has too much syntax errors in python 3

zhuyifei1999 claimed this task.
This comment was removed by zhuyifei1999.

Uh … great that Python 3 doesn't work :( Do you mean with “ASCII string” that the_string.encode('ascii') does not fail? Also your other traceback isn't very helpful either as there is no HTTP layer present (it's in another thread) :( .

And I don't think d798da349 fixed your issue (I at least couldn't explain why).

Sorry, I'm wrong again. Running after "git checkout d798da349087d231d9dad203cd6b48a1aea03875" gets the error again.

As for "ASCII string" I meant all characters are valid ascii characters, and the_string.encode('ascii') does not fail, and there is no u flag before the string.

Ok I added a traceback.print_exc() to https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/comms/threadedhttp.py#L225 and got this:

Traceback (most recent call last):
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/threadedhttp.py", line 221, in request
    max_redirects, connection_type
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1569, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1316, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1252, in _conn_request
    conn.request(method, request_uri, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 973, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 827, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128)
ERROR: Traceback (most recent call last):
  File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1556, in submit
    body=body, headers=headers)
  File "/data/project/yifeibot/pywikibot/pywikibot/tools/__init__.py", line 1094, in wrapper
    return obj(*__args, **__kw)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 279, in request
    r = fetch(baseuri, method, body, headers, **kwargs)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 381, in fetch
    error_handling_callback(request)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 297, in error_handling_callback
    raise request.data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128)

WARNING: Waiting 5 seconds before retrying.

Hopefully this works

@zhuyifei1999, please don't remove comments. Clarifying later that the issue wasn't resolved is clear enough.

Could you try adding

import httplib2
httplib2.debuglevel = 4

before the site.upload() call? That should make httplib2 output some more detailed information.

@valhallasw do you mean there's a log file or something? I find no change to stdout or stderr with that code added.

It should log to stdout, yes. For example:

>>> import httplib2
>>> httplib2.debuglevel = 4
>>> import pywikibot
>>> pywikibot.Page(pywikibot.Site(), 'test')
connect: (en.wikipedia.org, 443)
send: u'GET /w/api.php?maxlag=5&continue=&format=json&meta=siteinfo%7Cuserinfo&action=query&siprop=namespaces%7Cnamespacealiases%7Cgeneral&uiprop=blockinfo%7Chasmsg HTTP/1.1\r\nHost: en.wikipedia.org\r\nconnection: keep-alive\r\ncontent-type: application/x-www-form-urlencoded\r\naccept-encoding: gzip, deflate\r\nuser-agent: (wikipedia:en; User:Valhallasw) Pywikibot/2.0b3 (g5524) httplib2/0.9 Python/2.7.6.final.0\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
...
etc.

but it's possible the exception occurs earlier.

Can you provide the exact values you pass to

site.upload(imagePage, source_filename=path, comment=self.summary, ignore_warnings=ignore)

and the original file? That could help in reproducing the issue. Please check that, if you start

python pwb.py shell

and run

import pywikibot
site = pywikibot.Site('commons', 'commons')
site.upload(<your parameters>)

the same issue occurs.

connect: (commons.wikimedia.org, 443)
send: u'GET /w/api.php?maxlag=5&format=json&rawcontinue=&meta=userinfo&action=query&uiprop=blockinfo%7Cgroups%7Chasmsg%7Crights HTTP/1.1\r\nHost: commons.wikimedia.org\r\ncontent-type: application/x-www-form-urlencoded\r\nconnection: keep-alive\r\ncookie: centralauth_Token=(redacted); centralauth_User=YiFeiBot; forceHTTPS=1; commonswikiUserID=2915654; forceHTTPS=true; commonswikiUserName=YiFeiBot\r\naccept-encoding: gzip, deflate\r\nuser-agent: flickrAZ (commons:commons; User:YiFeiBot) Pywikibot/2.0b3 (g5523) httplib2/0.8 Python/2.7.6.final.0\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx/1.6.2
header: Date: Sun, 17 May 2015 01:11:33 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 465
header: Connection: keep-alive
header: X-Powered-By: HHVM/3.6.1
header: X-Content-Type-Options: nosniff
header: Cache-control: private, must-revalidate, max-age=0
header: X-Frame-Options: SAMEORIGIN
header: Content-Encoding: gzip
header: Vary: Accept-Encoding
header: Set-Cookie: commonswikiSession=(redacted); path=/; secure; httponly
header: X-Varnish: 968174735, 3348394954
header: Via: 1.1 varnish, 1.1 varnish
header: Accept-Ranges: bytes
header: Age: 0
header: X-Cache: cp1054 miss (0), cp1068 frontend miss (0)
header: Set-Cookie: GeoIP=::::v4; Path=/; Domain=.wikimedia.org
header: X-Analytics: https=1
header: Set-Cookie: WMF-Last-Access=17-May-2015;Path=/;HttpOnly;Expires=Thu, 18 Jun 2015 00:00:00 GMT
Page 1
Ryanair, EI-DWP, Boeing 737-8AS (17737230072).jpg
send: u'GET /w/api.php?inprop=protection&titles=File%3ARyanair%2C+EI-DWP%2C+Boeing+737-8AS+%2817737230072%29.jpg&continue=&format=json&prop=info&meta=userinfo&indexpageids=&action=query&maxlag=5&uiprop=blockinfo%7Chasmsg HTTP/1.1\r\nHost: commons.wikimedia.org\r\ncontent-type: application/x-www-form-urlencoded\r\nconnection: keep-alive\r\ncookie: GeoIP=::::v4; centralauth_Token=(redacted); centralauth_User=YiFeiBot; forceHTTPS=1; commonswikiSession=(redacted); commonswikiUserID=2915654; forceHTTPS=true; WMF-Last-Access=17-May-2015; commonswikiUserName=YiFeiBot\r\naccept-encoding: gzip, deflate\r\nuser-agent: flickrAZ (commons:commons; User:YiFeiBot) Pywikibot/2.0b3 (g5523) httplib2/0.8 Python/2.7.6.final.0\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx/1.6.2
header: Date: Sun, 17 May 2015 01:11:33 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 248
header: Connection: keep-alive
header: X-Powered-By: HHVM/3.6.1
header: X-Content-Type-Options: nosniff
header: Cache-control: private, must-revalidate, max-age=0
header: X-Frame-Options: SAMEORIGIN
header: Content-Encoding: gzip
header: Vary: Accept-Encoding
header: X-Varnish: 66566441, 3348396553
header: Via: 1.1 varnish, 1.1 varnish
header: Accept-Ranges: bytes
header: Age: 0
header: X-Cache: cp1065 miss (0), cp1068 frontend miss (0)
header: X-Analytics: https=1;WMF-Last-Access=17-May-2015
send: u'GET /w/api.php?maxlag=5&format=json&rawcontinue=&meta=tokens%7Cuserinfo&action=query&type=csrf%7Ccsrf%7Cdeleteglobalaccount%7Cpatrol%7Crollback%7Csetglobalaccountstatus%7Cuserrights%7Cwatch&uiprop=blockinfo%7Chasmsg HTTP/1.1\r\nHost: commons.wikimedia.org\r\ncontent-type: application/x-www-form-urlencoded\r\nconnection: keep-alive\r\ncookie: GeoIP=::::v4; centralauth_Token=(redacted); centralauth_User=YiFeiBot; forceHTTPS=1; commonswikiSession=(redacted); commonswikiUserID=2915654; forceHTTPS=true; WMF-Last-Access=17-May-2015; commonswikiUserName=YiFeiBot\r\naccept-encoding: gzip, deflate\r\nuser-agent: flickrAZ (commons:commons; User:YiFeiBot) Pywikibot/2.0b3 (g5523) httplib2/0.8 Python/2.7.6.final.0\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server: nginx/1.6.2
header: Date: Sun, 17 May 2015 01:11:34 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 321
header: Connection: keep-alive
header: X-Powered-By: HHVM/3.6.1
header: X-Content-Type-Options: nosniff
header: Cache-control: private, must-revalidate, max-age=0
header: X-Frame-Options: SAMEORIGIN
header: Content-Encoding: gzip
header: Vary: Accept-Encoding
header: X-Varnish: 968176035, 3348398412
header: Via: 1.1 varnish, 1.1 varnish
header: Accept-Ranges: bytes
header: Age: 0
header: X-Cache: cp1054 miss (0), cp1068 frontend miss (0)
header: X-Analytics: https=1;WMF-Last-Access=17-May-2015
Sleeping for 7.5 seconds, 2015-05-17 01:11:34
ERROR: Traceback (most recent call last):
  File "/data/project/yifeibot/pywikibot/pywikibot/data/api.py", line 1556, in submit
    body=body, headers=headers)
  File "/data/project/yifeibot/pywikibot/pywikibot/tools/__init__.py", line 1094, in wrapper
    return obj(*__args, **__kw)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 279, in request
    r = fetch(baseuri, method, body, headers, **kwargs)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 381, in fetch
    error_handling_callback(request)
  File "/data/project/yifeibot/pywikibot/pywikibot/comms/http.py", line 297, in error_handling_callback
    raise request.data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1850: ordinal not in range(128)

I believe the UnicodeDecodeError happened before the upload request is really sent.

The exact values is:

site.upload(pywikibot.FilePage(site, "File:Ryanair, EI-DWP, Boeing 737-8AS (17737230072).jpg"), source_filename="temp/fli.tmp", comment="[[Commons:Bots|Bot]]: Uploading files from Flickr", ignore_warnings=False)

git diff shows nothing and git status shows HEAD detached at d798da3

I tried the following...

# this image is https://flic.kr/p/t2nWxA
wget https://farm9.staticflickr.com/8880/17737230072_d678b9f089_o_d.jpg

# then
python pwb.py shell

and then in the python shell

import pywikibot
site = pywikibot.Site('commons', 'commons')
site.login() # site.upload does not do this itself for some reason
site.upload(pywikibot.FilePage(site, "File:Ryanair, EI-DWP, Boeing 737-8AS (17737230072).jpg"), source_filename="17737230072_d678b9f089_o_d.jpg", comment="Testing [[phab:T99366]] / [[Commons:Bots|Bot]]: Uploading files from Flickr", ignore_warnings=False)

which allows me to reproduce the issue -- in Python 2, at least. In Python 3, the upload magically works: https://commons.wikimedia.org/wiki/File:Ryanair,_EI-DWP,_Boeing_737-8AS_(17737230072).jpg

There are several issues. The requests has several unicode parts, which causes the entire request to be cast to a unicode string. Of course, this fails once we want to send binary data (namely, the file contents). At least the following ones are wrong:

  • u'POST /w/api.php HTTP/1.1'
  • 'Host: commons.wikimedia.org'
  • 'Content-Length: 743405'
  • 'accept-encoding: gzip, deflate'
  • u'user-agent: test (commons:commons; User:Valhallasw) Pywikibot/2.0b3 (g5524) httplib2/0.9 Python/2.7.6.final.0'
  • u'connection: keep-alive'
  • 'cookie: (...)'
  • 'content-type: multipart/form-data; boundary="===============1715999916774184062=="'
  • 'mime-version: 1.0'

and this may have something to do with the introduction of unicode_literals at some place.

I'll take a closer look at this later but it seems that we should encode all header entries to ASCII before we are sending it away. I guess in Python 3 httplib2 is doing that for us.

Change 211522 had a related patch set uploaded (by Merlijn van Deen):
Make sure all comms code works with native str

https://gerrit.wikimedia.org/r/211522

Is this fixed with the switch to requests?

We have upload tests which are not run by travis; do they work? Do we need to add more tests to cover this scenario?

I haven't looked into the script for a while, but it seems working with latest version.

This is fixed for the master (requests) branch, but this issue still persists in the 2.0 branch.

jayvdb renamed this task from Possible bug/error with unicode_literals: UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128) to error with httplib2 and unicode_literals: UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128).Jul 22 2015, 2:38 AM

For some reason, master branch also fail on tool labs grid engine with very similiar exception (UnicodeDecodeError raised in error_handling_callback). The exception does not happen on bastion hosts and so I didn't observe it earlier. @yuvipanda is there a console encoding or something set to ASCII on grid?

Are you sure you are using the master branch with requests and in any case do you have a stacktrace?

Added traceback.print_exc() and got:

Traceback (most recent call last):
  File "/data/project/yifeibot/panrb/pywikibot/comms/http.py", line 258, in _http_process
    verify=True, timeout=timeout)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 357, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 460, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 319, in send
    timeout=timeout
  File "/usr/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 542, in urlopen
    body=body, headers=headers)
  File "/usr/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 367, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python2.7/httplib.py", line 966, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1000, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 962, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 820, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1993: ordinal not in range(128)
ERROR: Traceback (most recent call last):
  File "/data/project/yifeibot/panrb/pywikibot/data/api.py", line 1699, in submit
    body=body, headers=headers)
  File "/data/project/yifeibot/panrb/pywikibot/tools/__init__.py", line 1227, in wrapper
    return obj(*__args, **__kw)
  File "/data/project/yifeibot/panrb/pywikibot/comms/http.py", line 246, in request
    r = fetch(baseuri, method, body, headers, **kwargs)
  File "/data/project/yifeibot/panrb/pywikibot/comms/http.py", line 363, in fetch
    error_handling_callback(request)
  File "/data/project/yifeibot/panrb/pywikibot/comms/http.py", line 280, in error_handling_callback
    raise request.data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1993: ordinal not in range(128)

WARNING: Waiting 40 seconds before retrying.

Requests is installed globally on tool labs grid engines. Should be up to date with apt repo. (It has been a while since I read the puppet files though)

Change 211522 abandoned by Merlijn van Deen:
Make sure all comms code works with native str

Reason:
Or not.

https://gerrit.wikimedia.org/r/211522

Peter601980 renamed this task from error with httplib2 and unicode_literals: UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128) to a.Dec 28 2015, 8:13 AM
Peter601980 closed this task as Invalid.
Peter601980 removed zhuyifei1999 as the assignee of this task.
Peter601980 triaged this task as Lowest priority.
Peter601980 updated the task description. (Show Details)
Peter601980 changed Security from None to Access Request.
Peter601980 removed subscribers: JeanFred, Mpaa, jayvdb and 6 others.
Aklapper renamed this task from a to error with httplib2 and unicode_literals: UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 1872: ordinal not in range(128).Dec 28 2015, 9:28 AM
Aklapper reopened this task as Open.
Aklapper raised the priority of this task from Lowest to Needs Triage.
Aklapper updated the task description. (Show Details)
Aklapper changed Security from Access Request to None.
Aklapper added subscribers: JeanFred, Mpaa, jayvdb and 6 others.
Xqt triaged this task as Medium priority.Jun 28 2017, 8:17 AM

requests is required for pywikibot instead of http2lib