Page MenuHomePhabricator

reflinks.py: unknown encoding
Closed, ResolvedPublicBUG REPORT

Description

== Pywikibot framework v6.0.0.dev0 -- Logging header ===
COMMAND: ['./scripts/reflinks.py', '-always', '-start:Социально-экономический_институт_СГТУ', '-v', 'debug']
DATE: 2021-03-07 16:05:35.276676 UTC
VERSION: [https] r-pywikibot-core (bc28c3b, g14368, 2021/03/07, 13:36:15, n/a)
SYSTEM: posix.uname_result(sysname='Linux', nodename='tools-sgebastion-08', release='4.19.0-0.bpo.14-amd64', version='#1 SMP Debian 4.19.171-2~deb9u1 (2021-02-08)', machine='x86_64')
CONFIG FILE DIR: /mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core

=========================================================
Python 3.5.3 (default, Nov 18 2020, 21:09:16) 
[GCC 6.3.0 20170516]
Found 1 wikipedia:ru processes running, including this one.
Retrieving 50 pages from wikipedia:ru.
WARNING: Http response status 404
HTTP error (404) for http://hakasia.roskazna.ru/page/10586 on [[Социально-экономический институт СГТУ им. Гагарина Ю. А.]]
Dropped throttle(s).

0 pages read
0 pages written
0 pages skipped
Execution time: 3 seconds
Script terminated by exception:

ERROR: LookupError: unknown encoding: Win-1251
Traceback (most recent call last):
  File "pwb.py", line 363, in <module>
    if not main():
  File "pwb.py", line 358, in main
    file_package)
  File "pwb.py", line 75, in run_python_file
    main_mod.__dict__)
  File "./scripts/reflinks.py", line 767, in <module>
    main()
  File "./scripts/reflinks.py", line 763, in main
    bot.run()
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/bot.py", line 1484, in run
    self.treat(page)
  File "./scripts/reflinks.py", line 535, in treat
    ref.url, use_fake_user_agent=self._use_fake_user_agent)
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/tools/__init__.py", line 1478, in wrapper
    return obj(*__args, **__kw)
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/comms/http.py", line 417, in fetch
    response.encoding = _decide_encoding(response, charset)
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/comms/http.py", line 472, in _decide_encoding
    return _try_decode(response.content, header_encoding)
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/comms/http.py", line 456, in _try_decode
    content.decode(encoding)
LookupError: unknown encoding: Win-1251
Dropped throttle(s).
Closing network session.
CRITICAL: Exiting due to uncaught exception <class 'LookupError'>
Network session closed.

Event Timeline

Change 668809 had a related patch set uploaded (by Rubin; owner: Rubin):
[pywikibot/core@master] fixing T276715

https://gerrit.wikimedia.org/r/668809

Change 668809 abandoned by Rubin:
[pywikibot/core@master] fixing T276715

Reason:

https://gerrit.wikimedia.org/r/668809

Rubin16 renamed this task from http.py: unknown encoding: Win-1251 to reflinks.py: unknown encoding.Mar 7 2021, 4:44 PM

not only win-1251.

Page [[Список кораблей Военно-морского флота Российской Федерации]] saved

2473 pages read
13 pages written
0 pages skipped
Execution time: 366 seconds
Read operation time: 0.1 seconds
Write operation time: 28.2 seconds
Script terminated by exception:

ERROR: LookupError: unknown encoding: binary
Traceback (most recent call last):
  File "pwb.py", line 363, in <module>
    if not main():
  File "pwb.py", line 358, in main
    file_package)
  File "pwb.py", line 75, in run_python_file
    main_mod.__dict__)
  File "./scripts/reflinks.py", line 767, in <module>
    main()
  File "./scripts/reflinks.py", line 763, in main
    bot.run()
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/bot.py", line 1484, in run
    self.treat(page)
  File "./scripts/reflinks.py", line 535, in treat
    ref.url, use_fake_user_agent=self._use_fake_user_agent)
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/tools/__init__.py", line 1478, in wrapper
    return obj(*__args, **__kw)
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/comms/http.py", line 417, in fetch
    response.encoding = _decide_encoding(response, charset)
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/comms/http.py", line 472, in _decide_encoding
    return _try_decode(response.content, header_encoding)
  File "/mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/comms/http.py", line 456, in _try_decode
    content.decode(encoding)
LookupError: unknown encoding: binary
CRITICAL: Exiting due to uncaught except
Xqt triaged this task as High priority.Mar 7 2021, 5:02 PM
Xqt added a subscriber: Mpaa.

Win-1251 is not a valid alias of Python Standard Encodings. Must be windows-1521.

Change 669725 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] check for LookupError exception in _try_decode

https://gerrit.wikimedia.org/r/669725

@Rubin16: Are you able to test this patch above?

This comment was removed by Rubin16.

it stopped crashing but it stopped editing too... it is hard to explain the problem, as I can't see the pages where problem happens.
I just see this

Retrieving 50 pages from wikipedia:ru.
Adding references section before Ссылки section...

So, I suppose some page should have been edited now, but there are no changes in the log of contributions: I ran bot for 10-15 minutes and it certainly should have edited some pages.

I've also noticed some new errors in trace:

Can't retrieve page http://www.vpc.org/studies/amroul2006.pdf : HTTPSConnectionPool(host='www.vpc.org', port=443): Max retries exceeded with url: /studies/amroul2006.pdf (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))
WARNING: /mnt/nfs/labstore-secondary-tools-project/rubin16/pywikibot-core/pywikibot/data/api.py:2014: ResourceWarning: unclosed <socket.socket fd=7, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.16.3.190', 59808), raddr=('72.10.32.113', 443)>
  uniquedescr, self._data, self._cachetime = pickle.load(f)
WARNING: /usr/lib/python3.5/socket.py:647: ResourceWarning: unclosed <socket.socket fd=14, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.16.3.190', 33590), raddr=('128.65.195.88', 443)>
  self._sock = None

Change 669725 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] check for LookupError exception in _try_decode

https://gerrit.wikimedia.org/r/669725

Change 669918 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [bugfix] Fix the issue that there are no changes in the log of contributions

https://gerrit.wikimedia.org/r/669918

I suppose some page should have been edited now, but there are no changes in the log of contributions

Should be solved with https://gerrit.wikimedia.org/r/669918

Change 669918 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Fix the issue that there are no changes in the log of contributions

https://gerrit.wikimedia.org/r/669918