Page MenuHomePhabricator

reflinks.py: unknown encoding utf8-r
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • running reflinks.py, the exact page - unknown, can't trace it from logs
toolforge-jobs run reflinks --command "$HOME/pwbvenv/bin/python3.11 $HOME/pywikibot-core/pwb.py reflinks -start:! -always" --image python3.11 --mem 2Gi --continuous

What happens?:

WARNING: Unknown or invalid encoding 'utf8-r'

164637 read operations
Execution time: 43 minutes, 42 seconds
Read operation time: 0.0 seconds
Script terminated by exception:

ERROR: unknown encoding: utf8-r (LookupError)
Traceback (most recent call last):
  File "/data/project/rubin16/pywikibot-core/pwb.py", line 40, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/project/rubin16/pywikibot-core/pwb.py", line 36, in main
    runpy.run_path(str(path), run_name='__main__')
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "/data/project/rubin16/pywikibot-core/pywikibot/scripts/wrapper.py", lin>
    main()
  File "/data/project/rubin16/pywikibot-core/pywikibot/scripts/wrapper.py", lin>
    if not execute():
           ^^^^^^^^^
  File "/data/project/rubin16/pywikibot-core/pywikibot/scripts/wrapper.py", lin>
    run_python_file(filename, script_args, module)
  File "/data/project/rubin16/pywikibot-core/pywikibot/scripts/wrapper.py", lin>
    exec(compile(source, filename, 'exec', dont_inherit=True),
  File "/data/project/rubin16/pywikibot-core/scripts/reflinks.py", line 788, in>
    main()
  File "/data/project/rubin16/pywikibot-core/scripts/reflinks.py", line 784, in>
    bot.run()
  File "/data/project/rubin16/pywikibot-core/pywikibot/bot.py", line 1581, in r>
    self.treat(page)
  File "/data/project/rubin16/pywikibot-core/scripts/reflinks.py", line 657, in>
    tag = meta_content.group().decode(enc)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
LookupError: unknown encoding: utf8-r
CRITICAL: Exiting due to uncaught exception LookupError: unknown encoding: utf8>

What should have happened instead?:
No crash with error.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

(pwbvenv) tools.rubin16@shell-1722598567:~$ pwb version
Pywikibot: pywikibot/__init__.py (, -1 (unknown), 2024/08/01, 12:49:30, UNKNOWN)
Release version: 9.3.0
packaging version: 24.1
mwparserfromhell version: 0.6.6
wikitextparser version: n/a
requests version: 2.32.3
  cacerts: /data/project/rubin16/pwbvenv/lib/python3.11/site-packages/certifi/cacert.pem
    certificate test: ok
Python: 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0]
PYWIKIBOT_DIR: Not set
PYWIKIBOT_DIR_PWB: /data/project/rubin16/pwbvenv/lib/python3.11/site-packages/pywikibot/scripts
PYWIKIBOT_NO_USER_CONFIG: Not set
Config base dir: /data/project/rubin16
Usernames for family 'wikipedia':
	ru: Rubinbot
Usernames for family 'wikibooks':
	ru: Rubinbot
Usernames for family 'commons':
	commons: Rubin16

Details

Event Timeline

just thinking, maybe instead of adding manual exceptions/substitutions (like T307760 or T312230) we could just make the bot skip pages where "unknown encoding" was encountered?

Xqt triaged this task as Medium priority.

The page was The Walt Disney Company CIS.

Change #1059454 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [fix] Ignore LookupError when decoding meta content

https://gerrit.wikimedia.org/r/1059454

Xqt changed the task status from Open to In Progress.Aug 3 2024, 3:55 PM
Xqt moved this task from Backlog to Needs Review on the Pywikibot board.

Change #1059454 merged by jenkins-bot:

[pywikibot/core@master] [fix] Ignore LookupError when decoding meta content

https://gerrit.wikimedia.org/r/1059454