Page MenuHomePhabricator

UnicodeDecodeError when using pywikibot delete.py
Closed, ResolvedPublic

Description

After running delete.py with most recent version (Pywikibot: [https] r-pywikibot-core.git (e9db1f9, g5334, 2015/04/11, 00:17:48, o
k) Release version: 2.0b3 httplib2 version: 0.9) with -summary containing non-ascii characters (Czech letters) move I get:

  File "pwb.py", line 161, in <module>
    import pywikibot  # noqa
  File "\core\pywikibot\__init__.py", line 32, in <modu
le>
    from pywikibot import config2 as config
  File "\core\pywikibot\config2.py", line 280, in <modu
le>
    _base_dir = get_base_dir()
  File "\core\pywikibot\config2.py", line 222, in get_b
ase_dir
    if arg.startswith("-dir:"):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 22: ordinal
 not in range(128)

This task is probably related to recenetly resolved https://phabricator.wikimedia.org/T95671

Event Timeline

Wesalius raised the priority of this task from to Needs Triage.
Wesalius updated the task description. (Show Details)
Wesalius added a project: Pywikibot.
Wesalius subscribed.
Restricted Application added subscribers: Aklapper, Unknown Object (MLST). · View Herald TranscriptApr 11 2015, 6:21 AM

Now I got the same error when using replace.py.

T95671 does probably not resolve the issue. I guess arg is bytes and not unicode:

>>> 'ä'.startswith('ä')
True
>>> 'ä'.startswith(u'ä')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

As “-dir” is now implicitly a unicode the second case happens and when arg is now bytes it must convert that into unicode to check if it startswith that letter. I guess handle_args needs to do some work to return unicodes or so. Problem is if the script expects that handle_args returns a list of bytes:

>>> u'ä'.startswith('ä')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

This is in config2.py, way before handle_args is actually called. @jayvdb pointed out on irc that

12:50 <jayvdb> nod. its sys.argv . so '-dir:' probably needs to be str('-argv') in order to avoid decoding

handle_args uses argvu (which is the decoded form of argv provided by the user interface), and should work as expected.

Oh that makes sense. Overlooked in the traceback that it happened in config2. Then the fix should be as @jayvdb described (although str('-dir') but I guess it was just a typo).

Change 203652 had a related patch set uploaded (by XZise):
[FIX] config2: Support unicode args

https://gerrit.wikimedia.org/r/203652

Change 203652 merged by jenkins-bot:
[FIX] config2: Support unicode args

https://gerrit.wikimedia.org/r/203652

Change 204872 had a related patch set uploaded (by Mpaa):
config2.py: Support unicode args (another case)

https://gerrit.wikimedia.org/r/204872

Change 204872 merged by jenkins-bot:
config2.py: Support unicode args (another case)

https://gerrit.wikimedia.org/r/204872

jayvdb claimed this task.