Page MenuHomePhabricator

UnicodeDecodeError when using pywikibot category.py
Closed, ResolvedPublic

Description

After running category.py with most recent version (Release version: 2.0b3; httplib2 version: 0.9) move I get:

WARNING: Type of 'console_encoding' changed
         Was: <type 'str'>
         Now: <type 'unicode'>

and if I try to move category named "Sabotáže" (notice the "ž"), the script ends with:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal
not in range(128)
<type 'exceptions.UnicodeDecodeError'>

Event Timeline

Wesalius raised the priority of this task from to Needs Triage.
Wesalius updated the task description. (Show Details)
Wesalius added a subscriber: Wesalius.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptApr 10 2015, 10:33 AM

Could you maybe provide the complete traceback and posting the result of python pwb.py version might be helpful (you don't have to post the usernames). We've made a change recently which could cause that error and Gerrit 203051 might be able to fix it (that's the reason I need a more specific version like the git hash).

version.py

WARNING: Type of 'transliteration_target' changed
         Was: <type 'unicode'>
         Now: <type 'str'>
Pywikibot: [https] r-pywikibot-core.git (e31b1d4, g5329, 2015/04/10, 10:18:01, o
k)
Release version: 2.0b3
httplib2 version: 0.9
  cacerts: 
    certificate test: ok
Python: 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)]
  unicode test: ok
PYWIKIBOT2_DIR: Not set
PYWIKIBOT2_DIR_PWB:
PYWIKIBOT2_NO_USER_CONFIG: Not set
Config base dir: 
Usernames for family "wikisource":
        cs: HypoBOT (no sysop configured)
Usernames for family "wikipedia":
        cs: HypoBOT (no sysop configured)
Usernames for family "wiktionary":
        cs: HypoBOT (no sysop configured)

traceback

$ python pwb.py category.py move
WARNING: Type of 'console_encoding' changed
         Was: <type 'str'>
         Now: <type 'unicode'>
WARNING: Type of 'transliteration_target' changed
         Was: <type 'unicode'>
         Now: <type 'str'>
Please enter the old name of the category: Sabotáže
Traceback (most recent call last):
  File "pwb.py", line 215, in <module>
    run_python_file(filename, argv, argvu, file_package)
  File "pwb.py", line 84, in run_python_file
    main_mod.__dict__)
  File ".\scripts\category.py", line 1248, in <module>
    main()
  File ".\scripts\category.py", line 1191, in main
    u'Please enter the old name of the category:')
  File "core\pywikibot\bot.py", line 548, in input
    data = ui.input(question, password)
  File "core\pywikibot\userinterfaces\terminal_interfa
ce_base.py", line 222, in input
    text = self._raw_input()
  File "core\pywikibot\userinterfaces\terminal_interfa
ce_win32.py", line 105, in _raw_input
    if '\x1a' in data:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal
not in range(128)
<type 'exceptions.UnicodeDecodeError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort

Okay I could've a patch uploaded shortly which you could test.

Ladsgroup triaged this task as Unbreak Now! priority.Apr 10 2015, 3:05 PM

Change 203346 had a related patch set uploaded (by XZise):
[FIX] Win32 UI: Explicitly use bytes in Python 2

https://gerrit.wikimedia.org/r/203346

Change 203346 merged by jenkins-bot:
[FIX] Win32 UI: Explicitly use bytes in Python 2

https://gerrit.wikimedia.org/r/203346

After updating today to

Pywikibot: [https] r-pywikibot-core.git (e9db1f9, g5334, 2015/04/11, 00:17:48, o
k)
Release version: 2.0b3
httplib2 version: 0.9

I still get the WARNING message, but the script works as should, so it is SOLVED. Thank you.

jayvdb added a subscriber: jayvdb.

The warning is a separate and very minor bug -- we need to hide/remove that warning, as it is not something the user can resolve.

@Wesalius: You could avoid the warning about the transliteration target when you add a “u” in front of it. So when your line in the user-config is transliteration_target = 'foo' the warning wouldn't appear with transliteration_target = u'foo'. But you don't have to worry that much about the warning. Regarding console_encoding it would be nice if you could comment on that in T95810.

As I stated in T95810 I am already using u' prefix.