Page MenuHomePhabricator

reflinks crashing / string pattern or bytes-like object
Closed, ResolvedPublic

Description

@PAWS:~$ pwb.py reflinks -start:Mark_of_the_Unicorn -verbose
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0]
LOADING SITE wikipedia:ru VERSION: 1.36.0-wmf.10
LOADING SITE wikipedia:ru VERSION: 1.36.0-wmf.10
Found 1 wikipedia:ru processes running, including this one.
WARNING: /srv/paws/pwb/scripts/reflinks.py:499: ResourceWarning: unclosed file <_io.BufferedReader name='404-links.txt'>
  dead_links = codecs.open(listof404pages, 'r', 'latin_1').read()

Retrieving 50 pages from wikipedia:ru.
No changes were needed on [[Mark of the Unicorn]]
Traceback (most recent call last):
  File "/srv/paws/pwb/pwb.py", line 360, in <module>
    if not main():
  File "/srv/paws/pwb/pwb.py", line 355, in main
    file_package)
  File "/srv/paws/pwb/pwb.py", line 74, in run_python_file
    main_mod.__dict__)
  File "/srv/paws/pwb/scripts/reflinks.py", line 818, in <module>
    main()
  File "/srv/paws/pwb/scripts/reflinks.py", line 814, in main
    bot.run()
  File "/srv/paws/pwb/scripts/reflinks.py", line 654, in run
    elif not self.MIME.search(content_type):
TypeError: cannot use a string pattern on a bytes-like object
Dropped throttle(s).
Closing network session.
CRITICAL: Exiting due to uncaught exception <class 'TypeError'>
Network session closed.
@PAWS:~$

Event Timeline

Change 632195 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] use byte-like regex for bytes

https://gerrit.wikimedia.org/r/632195

Xqt triaged this task as Lowest priority.

@Rubin16: Can you review this patch?

@Xqt

@PAWS:~$ pwb.py ~/reflinks2.py -start:Mark_of_the_Unicorn -verbose
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0]
LOADING SITE wikipedia:ru VERSION: 1.36.0-wmf.10
LOADING SITE wikipedia:ru VERSION: 1.36.0-wmf.10
Found 2 wikipedia:ru processes running, including this one.
WARNING: /home/paws/reflinks2.py:499: ResourceWarning: unclosed file <_io.BufferedReader name='404-links.txt'>
  dead_links = codecs.open(listof404pages, 'r', 'latin_1').read()

Retrieving 50 pages from wikipedia:ru.
No changes were needed on [[Mark of the Unicorn]]
Traceback (most recent call last):
  File "/srv/paws/pwb/pwb.py", line 360, in <module>
    if not main():
  File "/srv/paws/pwb/pwb.py", line 355, in main
    file_package)
  File "/srv/paws/pwb/pwb.py", line 74, in run_python_file
    main_mod.__dict__)
  File "/home/paws/reflinks2.py", line 818, in <module>
    main()
  File "/home/paws/reflinks2.py", line 814, in main
    bot.run()
  File "/home/paws/reflinks2.py", line 637, in run
    tmp = s.group('enc').strip("\"' ").lower()
TypeError: a bytes-like object is required, not 'str'
Dropped throttle(s).
Closing network session.
CRITICAL: Exiting due to uncaught exception <class 'TypeError'>
Network session closed.

@Rubin16: Could you please check again?

seems to be working, let's merge it and I will test it for a longer period of time.

Thanks a lot @Xqt

Merged to master. Merge to stable (PAWS) is comming soon.

Change 632195 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] decode byte-like object meta_content.group() in reflinks.py

https://gerrit.wikimedia.org/r/632195