Page MenuHomePhabricator

terminal_interface_base.UI.input() crashes with UnicodeDecodeError after processing some pages sometimes
Closed, ResolvedPublicBUG REPORT

Description

Steps to Reproduce:

Collect a list of pages containing unicode text
Run the following command (replace filename and fix tag name from user-fixes.py as per your configuration)
pwb.py replace -fix:vebug1 -file:pages-ve-bug.txt

Actual Results:
crashes with the following log after processing some pages

Example crash log
>>> బొరుగులు <<<
@@ -10 +10 @@
- #[[జల్లెడ]] పట్టి [[ఇసుక]]<nowiki/>ని తీసివెయ్యండి
+ #[[జల్లెడ]] పట్టి [[ఇసుక]]ని తీసివెయ్యండి

Do you want to accept these changes? ([y]es, [N]o, [e]dit original, edit
[l]atest, open in [b]rowser, [a]ll, [q]uit):a
Traceback (most recent call last):
  File "/srv/paws/pwb/pwb.py", line 250, in <module>
    if not main():
  File "/srv/paws/pwb/pwb.py", line 243, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "/srv/paws/pwb/pwb.py", line 95, in run_python_file
    main_mod.__dict__)
  File "replace.py", line 1191, in <module>
    main()
  File "replace.py", line 1182, in main
    bot.run()
  File "replace.py", line 773, in run
    default='N')
  File "/srv/paws/pwb/pywikibot/bot.py", line 502, in input_choice
    automatic_quit=automatic_quit, force=force)
  File "/srv/paws/pwb/pywikibot/userinterfaces/terminal_interface_base.py", line 381, in input_choice
    answer = self.input(output) or default
  File "/srv/paws/pwb/pywikibot/userinterfaces/terminal_interface_base.py", line 293, in input
    text = self._input_reraise_cntl_c(password)
  File "/srv/paws/pwb/pywikibot/userinterfaces/terminal_interface_base.py", line 309, in _input_reraise_cntl_c
    text = self._raw_input()
  File "/srv/paws/pwb/pywikibot/userinterfaces/terminal_interface_base.py", line 248, in _raw_input
    return input()
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
CRITICAL: Exiting due to uncaught exception <class 'UnicodeDecodeError'>

Trying to run the page individually sometimes works

Expected Results:
All pages processed without error

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, jeblad, Aklapper. · View Herald Transcript

user-fixes.py containing sample replace scripts.

The logs mentioned are from PAWS 1.2

Arjunaraoc renamed this task from pwb replace.py working on a list of pages of more than 500 crashes with UnicodeDecodeError after processing some pages to pwb replace.py working on a list of pages of more than 500 crashes with UnicodeDecodeError after processing some pages sometimes.Jul 16 2020, 10:21 AM

As per https://phabricator.wikimedia.org/T258142, the above report is based on older version not updated in PAWS. I will report if it happens with the latest version, when I use it.

Xqt changed the task status from Open to Stalled.Aug 4 2020, 3:57 AM
Xqt changed the task status from Stalled to Open.Aug 4 2020, 4:07 AM
Xqt subscribed.

Looks like you added any unicode chars for the input and used the delete key to delete them and overwrite it with "a". This is a known Python bug. We should have an exception handling then.

Looks like you added any unicode chars for the input and used the delete key to delete them and overwrite it with "a". This is a known Python bug. We should have an exception handling then.

Do not think so, as the crash happend after processing some pages as well.

Do not think so, as the crash happend after processing some pages as well.

The script is waiting for any input from the console:

Do you want to accept these changes? ([y]es, [N]o, [e]dit original, edit
[l]atest, open in [b]rowser, [a]ll, [q]uit):a

The problem can happen in that way described above: https://stackoverflow.com/questions/11386747/python-input-unicodedecodeerror

Change 638503 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Ignore UnicodeDecodeError on input

https://gerrit.wikimedia.org/r/638503

Xqt triaged this task as Lowest priority.
Xqt removed a project: Pywikibot-replace.py.
Xqt renamed this task from pwb replace.py working on a list of pages of more than 500 crashes with UnicodeDecodeError after processing some pages sometimes to terminal_interface_base.UI.input() crashes with UnicodeDecodeError after processing some pages sometimes.Nov 3 2020, 4:51 PM

Change 638503 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Ignore UnicodeDecodeError on input

https://gerrit.wikimedia.org/r/638503