Page MenuHomePhabricator

reflinks.py crash - TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

tools.rubin16@tools-sgebastion-08:~$ venv/bin/python3 $HOME/pywikibot-core/pwb.py reflinks -start:Моррисон,_Наташа -v -debug -limit:1
Found 1  processes running, including this one.

What happens?:

Python 3.5.3 (default, Nov  4 2021, 15:29:10) 
[GCC 6.3.0 20170516]
Found 2 wikipedia:ru processes running, including this one.
Retrieving 50 pages from wikipedia:ru.
Http response does not contain a charset.
WARNING : media : http://dt9guucc6nuua.cloudfront.net/competitiondocuments/pdf/5676/AT-4X1-W-h----.RS6.pdf?v=-92722794 
Http response does not contain a charset.
WARNING : media : http://dt9guucc6nuua.cloudfront.net/competitiondocuments/pdf/5676/AT-4X1-W-f----.RS6.pdf?v=-1737224027 
Http response does not contain a charset.
Reading PDF file...
Dropped throttle(s).

0 pages read
0 pages written
0 pages skipped
Execution time: 1 seconds
Script terminated by exception:

ERROR: TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Traceback (most recent call last):
  File "/data/project/rubin16/pywikibot-core/pwb.py", line 496, in <module>
    main()
  File "/data/project/rubin16/pywikibot-core/pwb.py", line 480, in main
    if not execute():
  File "/data/project/rubin16/pywikibot-core/pwb.py", line 463, in execute
    run_python_file(filename, script_args, module)
  File "/data/project/rubin16/pywikibot-core/pwb.py", line 144, in run_python_file
    main_mod.__dict__)
  File "/data/project/rubin16/pywikibot-core/scripts/reflinks.py", line 801, in <module>
    main()
  File "/data/project/rubin16/pywikibot-core/scripts/reflinks.py", line 797, in main
    bot.run()
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/bot.py", line 1570, in run
    self.treat(page)
  File "/data/project/rubin16/pywikibot-core/scripts/reflinks.py", line 579, in treat
    self.getPDFTitle(ref, r)
  File "/data/project/rubin16/pywikibot-core/scripts/reflinks.py", line 525, in getPDFTitle
    if aline.lower().startswith('title'):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Dropped throttle(s).
Closing network session.
CRITICAL: Exiting due to uncaught exception <class 'TypeError'>

What should have happened instead?:
No crash happening

Event Timeline

Change 770560 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [bugfix] Decode pdfinfo if it is bytes

https://gerrit.wikimedia.org/r/770560

Xqt triaged this task as Medium priority.Mar 14 2022, 5:36 PM

Change 770560 merged by jenkins-bot:

[pywikibot/core@master] [bugfix] Decode pdfinfo if it is bytes

https://gerrit.wikimedia.org/r/770560