Page MenuHomePhabricator

Cosmetic_changes: coercing to Unicode: need string or buffer
Open, MediumPublic

Description

I:\py\rewrite>pwb.py cosmetic_changes.py -family:wiktionary -start:-ůw

Retrieving 50 pages from wiktionary:cs.
...

>>> Addis Abeba <<<
@@ -62 +62 @@
- [[sv:Addis Abeba]]
+ [[sv:Addis Abeba]]

Edit summary: Robot: kosmetické úpravy
Page [[Addis Abeba]] saved


>>> Adelaide <<<

227 pages read
197 pages written
Execution time: 148 seconds
Read operation time: 0 seconds
Write operation time: 0 seconds
Script terminated by exception:

ERROR: TypeError: coercing to Unicode: need string or buffer, NoneType found
Traceback (most recent call last):
  File "I:\py\rewrite\pwb.py", line 239, in <module>
    if not main():
  File "I:\py\rewrite\pwb.py", line 233, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "I:\py\rewrite\pwb.py", line 111, in run_python_file
    main_mod.__dict__)
  File ".\scripts\cosmetic_changes.py", line 146, in <module>
    main()
  File ".\scripts\cosmetic_changes.py", line 139, in main
    bot.run()
  File "I:\py\rewrite\pywikibot\bot.py", line 1805, in run
    super(MultipleSitesBot, self).run()
  File "I:\py\rewrite\pywikibot\bot.py", line 1619, in run
    self.treat(page)
  File "I:\py\rewrite\pywikibot\bot.py", line 1906, in treat
    super(ExistingPageBot, self).treat(page)
  File "I:\py\rewrite\pywikibot\bot.py", line 1970, in treat
    super(NoRedirectPageBot, self).treat(page)
  File "I:\py\rewrite\pywikibot\bot.py", line 1833, in treat
    self.treat_page()
  File ".\scripts\cosmetic_changes.py", line 76, in treat_page
    changedText = ccToolkit.change(self.current_page.get())
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 279, in change
    new_text = self._change(text)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 273, in _change
    text = self.safe_execute(method, text)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 260, in safe_execute
    result = method(text)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 601, in cleanUpLinks
    'startspace'])
  File "I:\py\rewrite\pywikibot\textlib.py", line 292, in replaceExcept
    replacement = new(match)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 488, in handleOneLink

    if not self.site.isInterwikiLink(titleWithSection):
  File "I:\py\rewrite\pywikibot\site.py", line 1096, in isInterwikiLink
    linkfam, linkcode = pywikibot.Link(text, self).parse_site()
  File "I:\py\rewrite\pywikibot\page.py", line 4769, in parse_site
    newsite = self._source.interwiki(prefix)
  File "I:\py\rewrite\pywikibot\site.py", line 933, in interwiki
    return self._interwikimap[prefix].site
  File "I:\py\rewrite\pywikibot\site.py", line 689, in __getitem__
    raise self._iw_sites[prefix].site
TypeError: coercing to Unicode: need string or buffer, NoneType found
<type 'exceptions.TypeError'>
CRITICAL: Closing network session.

Event Timeline

JAnD created this task.Sep 13 2015, 10:12 AM
JAnD raised the priority of this task from to Medium.
JAnD updated the task description. (Show Details)
JAnD added a subscriber: JAnD.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptSep 13 2015, 10:12 AM
JAnD renamed this task from coercing to Unicode: need string or buffer to Cosmetic_changes: coercing to Unicode: need string or buffer.Sep 13 2015, 10:13 AM
JAnD set Security to None.
JAnD added a subscriber: Danny_B.
XZise added a subscriber: XZise.Sep 13 2015, 10:14 AM

Now that code was afaik added recently and if I understand it correctly that TypeError happened when it tried to create a APISite instance by using Site(url=). Unfortunately when I ran it I couldn't reproduce it:

$ python pwb.py cosmetic_changes.py -family:wiktionary -lang:cs -page:Adelaide
ATTENTION: You can run this script as a stand-alone for testing purposes.
However, the changes that are made are only minor, and other users
might get angry if you fill the version histories and watchlists with such
irrelevant changes. Some wikis prohibit stand-alone running.
Do you really want to continue? ([y]es, [N]o): y
Password for user XZise on wiktionary:cs (no characters will be shown): 
Logging in to wiktionary:cs as XZise
Retrieving 1 pages from wiktionary:cs.


>>> Adelaide <<<
WARNING: Error in Family(ffwiki).from_url: Family(ffwiki): matched regex has not matched a domain in langs
@@ -59 +59 @@
- [[zh:Adelaide]]
+ [[zh:Adelaide]]

Edit summary: Robot: kosmetické úpravy
Do you want to accept these changes? ([y]es, [N]o, [a]ll, [q]uit): q

Both on Python 2.7.10 and 3.4.3. The warning regarding ffwiki is unrelated. Does this command reproduce the error for you?

JAnD added a comment.Sep 13 2015, 11:21 AM

I am not 100% sure if is problem in this page, because bot loads 50 pages and then work not in alphabert sort but by ID.

Bot crashed around pages
[[-ův]]
[[Adelaide]]
[[Alláh]]

Retrieving 1 pages from wiktionary:cs.


>>> Adelaide <<<
WARNING: I:\py\rewrite\pywikibot\family.py:926: FamilyMaintenanceWarning: Family
 name wikimediachapter does not match family module name wikimedia

0 pages read
0 pages written
Execution time: 0 seconds
Script terminated by exception:
JAnD added a comment.Sep 13 2015, 6:44 PM

Next issue, this time when running interwiki.py - seems to be problem in page.py or site.py

Retrieving 7 pages from wiktionary:he.
======Post-processing [[ro:Categorie:Filozofie]]======
WARNING: I:\py\rewrite\pywikibot\family.py:926: FamilyMaintenanceWarning: Family
 name wikimediachapter does not match family module name wikimedia
Dump ro (wiktionary) written.
Traceback (most recent call last):
  File "I:\py\rewrite\pwb.py", line 239, in <module>
    if not main():
  File "I:\py\rewrite\pwb.py", line 233, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "I:\py\rewrite\pwb.py", line 111, in run_python_file
    main_mod.__dict__)
  File ".\scripts\interwiki.py", line 2633, in <module>
    main()
  File ".\scripts\interwiki.py", line 2608, in main
    bot.run()
  File ".\scripts\interwiki.py", line 2352, in run
    self.queryStep()
  File ".\scripts\interwiki.py", line 2330, in queryStep
    subj.finish()
  File ".\scripts\interwiki.py", line 1677, in finish
    if self.replaceLinks(new[site], new):
  File ".\scripts\interwiki.py", line 1844, in replaceLinks
    if (new[ignorepage.site] == ignorepage) and \
  File "I:\py\rewrite\pywikibot\page.py", line 147, in site
    return self._link.site
  File "I:\py\rewrite\pywikibot\page.py", line 4903, in site
    self.parse()
  File "I:\py\rewrite\pywikibot\page.py", line 4810, in parse
    newsite = self._site.interwiki(prefix)
  File "I:\py\rewrite\pywikibot\site.py", line 933, in interwiki
    return self._interwikimap[prefix].site
  File "I:\py\rewrite\pywikibot\site.py", line 689, in __getitem__
    raise self._iw_sites[prefix].site
TypeError: coercing to Unicode: need string or buffer, NoneType found
<type 'exceptions.TypeError'>
CRITICAL: Closing network session.

I:\py\rewrite>
XZise added a comment.Sep 14 2015, 1:07 PM

I'm unfortunately not able to reproduce that issue. Are you able with the following command:

python pwb.py cosmetic_changes.py -family:wiktionary -lang:cs -page:Adelaide -page:-ův -page:Alláh -simulate

Actually, why is it detecting "cosmetic changes" when there are none? This seems to happen when there are interwiki links in the page. Is it trying to remove the final CR/LF?

Mpaa added a subscriber: Mpaa.Sep 14 2015, 8:16 PM

Actually, why is it detecting "cosmetic changes" when there are none? This seems to happen when there are interwiki links in the page. Is it trying to remove the final CR/LF?

Actually it is trying to add one.

>>> Adelaide <<<
> /home/user/python/core/pywikibot/bot.py(1524)userPut()
-> pywikibot.showDiff(oldtext, newtext)
(Pdb) oldtext
u"== \u010de\u0161tina == ... [[uk:Adelaide]]\n[[zh:Adelaide]]"
(Pdb) newtext
u"== \u010de\u0161tina == ... [[uk:Adelaide]]\n[[zh:Adelaide]]\n"
XZise added a comment.Sep 14 2015, 8:19 PM

I guess https://gerrit.wikimedia.org/r/#/c/238149/2 is related? In that case add “Bug: T112449” above the last line.

Change 238149 had a related patch set uploaded (by Mpaa):
Changes are wrongly detected in the last langlink

https://gerrit.wikimedia.org/r/238149

Malafaya added a comment.EditedSep 14 2015, 9:41 PM

Related to my comment above? Yes. Related to the original problem and will it solve the problem? No ;).

The change detection patch was merged today. Nevertheless, the original problem reported should still exist.

JAnD added a comment.Sep 16 2015, 5:13 AM

I'm unfortunately not able to reproduce that issue. Are you able with the following command:
python pwb.py cosmetic_changes.py -family:wiktionary -lang:cs -page:Adelaide -page:-ův -page:Alláh -simulate

The same crash result. I'll try it on the second computer.
I tried to delete and restore by svn all mentioned scripts, but the result is still the same :-(

JAnD added a comment.Sep 16 2015, 10:00 AM

The same crash result. I'll try it on the second computer.

On the second comuter it works, only with T109190

But T109190 is a completely different error? So it crashes with the error you describe here when you don't ignore ISBN numbers but otherwise it works?

Jar added a subscriber: Jar.Nov 26 2016, 4:03 AM

I Get This WARNING Too

WARNING: core/pywikibot/family.py:930: FamilyMaintenanceWarning: Family name wikimediachapter does not match family module name wikimedia? and the text saved with out Unicode. like this "تعرض هذه الصفحة قاØ"

is there Solution ?

Magul added a subscriber: Magul.Nov 26 2016, 11:03 AM

@Jar I don't know right now and the most efficient way to get help here is to open new issue with the most depth description You can provide, so please do so.

Restricted Application added a subscriber: alanajjar. · View Herald TranscriptMay 19 2018, 5:51 PM