Page MenuHomePhabricator

Unicode error in html2unicode
Closed, ResolvedPublic

Description

Traceback:

Periodic table -> corresponding page is ?????
Traceback (most recent call last):

File "C:\pwb\core\pwb.py", line 143, in <module>
  run_python_file(fn, argv, argvu)
File "C:\pwb\core\pwb.py", line 67, in run_python_file
  exec(compile(source, filename, "exec"), main_mod.__dict__)
File "C:\pwb\core\scripts\featured.py", line 616, in <module>
  main()
File "C:\pwb\core\scripts\featured.py", line 609, in main
  bot.run()
File "C:\pwb\core\scripts\featured.py", line 323, in run
  self.run_task(task)
File "C:\pwb\core\scripts\featured.py", line 335, in run_task
  self.treat(site, task)
File "C:\pwb\core\scripts\featured.py", line 343, in treat
  self.featuredWithInterwiki(fromsite, task)
File "C:\pwb\core\scripts\featured.py", line 584, in featuredWithInterwiki
  atrans.put(text, comment)
File "C:\pwb\core\pywikibot\page.py", line 937, in put
  **kwargs)
File "C:\pwb\core\pywikibot\page.py", line 858, in save
  **kwargs)
File "C:\pwb\core\pywikibot\page.py", line 865, in _save
  comment = self._cosmetic_changes_hook(comment) or comment
File "C:\pwb\core\pywikibot\page.py", line 915, in _cosmetic_changes_hook
  self.text = ccToolkit.change(old)
File "C:\pwb\core\scripts\cosmetic_changes.py", line 175, in change
  text = self.cleanUpLinks(text)
File "C:\pwb\core\scripts\cosmetic_changes.py", line 512, in cleanUpLinks
  'startspace'])
File "C:\pwb\core\pywikibot\textlib.py", line 210, in replaceExcept
  replacement = new(match)
File "C:\pwb\core\scripts\cosmetic_changes.py", line 398, in handleOneLink
  if not self.site.isInterwikiLink(titleWithSection):
File "C:\pwb\core\pywikibot\site.py", line 370, in isInterwikiLink
  linkfam, linkcode = pywikibot.Link(text, self).parse_site()
File "C:\pwb\core\pywikibot\page.py", line 3168, in __init__
  t = html2unicode(self._text)
File "C:\pwb\core\pywikibot\page.py", line 3555, in html2unicode
  result += unichr(unicodeCodepoint)

ValueError: unichr() arg not in range(0x10000) (narrow Python build)
<type 'exceptions.ValueError'>
CRITICAL: Waiting for 1 network thread(s) to finish. Press ctrl-c to abort


Version: core-(2.0)
Severity: normal

Details

Reference
bz66345

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:09 AM
bzimport set Reference to bz66345.

The unicodeCodepoint is 166336 here

Change 138311 had a related patch set uploaded by Xqt:
(bug 66345) solve unicode error in html2unicode

https://gerrit.wikimedia.org/r/138311

Change 138311 merged by jenkins-bot:
(bug 66345) solve unicode error in html2unicode

https://gerrit.wikimedia.org/r/138311