Page MenuHomePhabricator

Argument processing chokes on python 2 when an argument contains non-ascii
Closed, ResolvedPublic

Description

@.avgas Encountered this error while using replace.py:

tools.irishbot@tools-bastion-03:~$ python /data/project/shared/pywikipedia/core/scripts/replace.py -regex 'scade alle 23.59 del giorno martedì 3 luglio 2018' 'scade alle 23.59 del giorno giovedì 5 luglio 2018' -page:"Wikipedia:Pagine_da_cancellare/Step In Fluid" -page:"Wikipedia:Pagine_da_cancellare/Harun Demiraslan" llare/Step In Fluid" -page:"Wikipedia:Pagine_da_cancellare/Harun Demirasla" -page:"Wikipedia:Pagine_da_cancellare/Codici ICAO" -page:"Wikipedia:Pagine_da_cancellare/Ingegneria logistica e della produzione" -page:"Wikipedia:Pagine_da_cancellare/Summer Cummings" -page:"Wikipedia:Pagine_da_cancellare/Milos Rechtorik" -page:"Wikipedia:Pagine_da_cancellare/Aymen Krouma" -page:"Wikipedia:Pagine_da_cancellare/AMSN2" -page:"Wikipedia:Pagine_da_cancellare/Róbert Matejka" -page:"Wikipedia:Pagine_da_cancellare/Sylwester Lusiusz" -page:"Wikipedia:Pagine_da_cancellare/Radosław Kanach" -page:"Wikipedia:Pagine_da_cancellare/Aleš Mandous" -page:"Wikipedia:Pagine_da_cancellare/Paolo Martinelli (vescovo)" -page:"Wikipedia:Pagine_da_cancellare/Massimo Russo (batterista)" -page:"Wikipedia:Pagine_da_cancellare/Simone Venturi" -summary:"Prorogo PDC in seguito ad oscuramento"
Traceback (most recent call last):
  File "/data/project/shared/pywikipedia/core/scripts/replace.py", line 1198, in <module>
    main()
  File "/data/project/shared/pywikipedia/core/scripts/replace.py", line 924, in main
    if genFactory.handleArg(arg):
  File "/data/project/shared/pywikipedia/core/pywikibot/pagegenerators.py", line 1179, in handleArg
    handler = getattr(self, '_handle_' + arg[1:], None)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xec' in position 41: ordinal not in range(128)
<type 'exceptions.UnicodeEncodeError'>
CRITICAL: Closing network session.

I was able to reduce the number of arguments:

zhuyifei1999@zhuyifei1999-ThinkPad-X260:~/mw-dev/pywikibot-core$ python pwb.py replace .ì
family and mylang are not set.
Defaulting to family='test' and mylang='test'.
Traceback (most recent call last):
  File "pwb.py", line 251, in <module>
    if not main():
  File "pwb.py", line 244, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "pwb.py", line 115, in run_python_file
    main_mod.__dict__)
  File "./scripts/replace.py", line 1198, in <module>
    main()
  File "./scripts/replace.py", line 924, in main
    if genFactory.handleArg(arg):
  File "/home/zhuyifei1999/mw-dev/pywikibot-core/pywikibot/pagegenerators.py", line 1179, in handleArg
    handler = getattr(self, '_handle_' + arg[1:], None)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xec' in position 8: ordinal not in range(128)
<type 'exceptions.UnicodeEncodeError'>
CRITICAL: Closing network session.
zhuyifei1999@zhuyifei1999-ThinkPad-X260:~/mw-dev/pywikibot-core$ python pwb.py shell ì
family and mylang are not set.
Defaulting to family='test' and mylang='test'.
WARNING: ./scripts/shell.py:30: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if '-noimport' in args:

Traceback (most recent call last):
  File "pwb.py", line 251, in <module>
    if not main():
  File "pwb.py", line 244, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "pwb.py", line 115, in run_python_file
    main_mod.__dict__)
  File "./scripts/shell.py", line 61, in <module>
    main(*args)
  File "./scripts/shell.py", line 36, in main
    args = pywikibot.handle_args(args)
  File "/home/zhuyifei1999/mw-dev/pywikibot-core/pywikibot/bot.py", line 927, in handle_args
    option, sep, value = arg.partition(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
<type 'exceptions.UnicodeDecodeError'>
CRITICAL: Closing network session.

I did this diff to inspect the args:

diff --git a/pywikibot/bot.py b/pywikibot/bot.py
index 51ad1bc8..549fbe6d 100644
--- a/pywikibot/bot.py
+++ b/pywikibot/bot.py
@@ -913,6 +913,7 @@ def handle_args(args=None, do_help=True):
         # it's the version in pywikibot.__init__ that is changed by scripts,
         # not the one in pywikibot.bot.
         args = pywikibot.argvu[1:]
+    print(pywikibot.argvu)
     # get the name of the module calling this function. This is
     # required because the -help option loads the module's docstring and
     # because the module name will be used for the filename of the log.

But they are indeed unicode ([u'shell', u'\xec']), not str like what I guessed the cause is.

@Multichill Found that the error was not present in previous versions Pywikibot: [https] r-pywikibot-core.git (7e33658, g9185, 2018/03/18, 13:01:38, OUTDATED) Pywikibot: [https] r-pywikibot-core.git (3c79c5d, g9374, 2018/04/26, 13:59:16, OUTDATED), but the problem exists in pywikibot nightly (toolforge) and latest master.

Event Timeline

shell.py uses sys.argv instread of pywikibot.argvu, I'll just ignore that for now since there isn't really a reason to provide non-ascii args to shell.py.

I did this diff:

zhuyifei1999@zhuyifei1999-ThinkPad-X260:~/mw-dev/pywikibot-core$ git diff
diff --git a/pywikibot/pagegenerators.py b/pywikibot/pagegenerators.py
index ec06f313..95cf67dd 100644
--- a/pywikibot/pagegenerators.py
+++ b/pywikibot/pagegenerators.py
@@ -1176,6 +1176,7 @@ class GeneratorFactory(object):
         if value == '':
             value = None
 
+        print(repr('_handle_', arg[1:], '_handle_' + arg[1:]))
         handler = getattr(self, '_handle_' + arg[1:], None)
         if handler:
             handler_result = handler(value)

Confirming everything is unicode:

zhuyifei1999@zhuyifei1999-ThinkPad-X260:~/mw-dev/pywikibot-core$ python pwb.py replace .ì
family and mylang are not set.
Defaulting to family='test' and mylang='test'.
(u'_handle_', u'\xec', u'_handle_\xec')
Traceback (most recent call last):
  File "pwb.py", line 251, in <module>
    if not main():
  File "pwb.py", line 244, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "pwb.py", line 115, in run_python_file
    main_mod.__dict__)
  File "./scripts/replace.py", line 1198, in <module>
    main()
  File "./scripts/replace.py", line 924, in main
    if genFactory.handleArg(arg):
  File "/home/zhuyifei1999/mw-dev/pywikibot-core/pywikibot/pagegenerators.py", line 1180, in handleArg
    handler = getattr(self, '_handle_' + arg[1:], None)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xec' in position 8: ordinal not in range(128)
<type 'exceptions.UnicodeEncodeError'>
CRITICAL: Closing network session.

So the error is getattr cannot process a non-ascii unicode, and I can confirm this:

zhuyifei1999@zhuyifei1999-ThinkPad-X260:~/mw-dev/pywikibot-core$ python
Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> type(b'footype', (object,), {})()
<__main__.footype object at 0x7efc8e62c110>
>>> getattr(type(b'footype', (object,), {})(), 'ì')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'footype' object has no attribute 'ì'
>>> getattr(type(b'footype', (object,), {})(), u'ì')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xec' in position 0: ordinal not in range(128)

So, Regression from d8ffde8. CC @Dalba

Change 443957 had a related patch set uploaded (by Zhuyifei1999; owner: Zhuyifei1999):
[pywikibot/core@master] pagegenerators: try..except UnicodeError on getattr()

https://gerrit.wikimedia.org/r/443957

Change 443957 merged by jenkins-bot:
[pywikibot/core@master] pagegenerators: try..except UnicodeEncodeError on getattr()

https://gerrit.wikimedia.org/r/443957

@.avgas It should land on toolforge nightly at 1 UTC.