Page MenuHomePhabricator

cosmetic_changes crashes with wrong ISBN
Closed, DuplicatePublic

Description

When processing https://cs.wiktionary.org/w/index.php?title=autrement&oldid=706077

$ python pwb.py cosmetic_changes.py -family:wiktionary -page:autrement -lang:cs

...
>>> autrement <<<

306 pages read
293 pages written
Execution time: 222 seconds
Read operation time: 0 seconds
Write operation time: 0 seconds
Script terminated by exception:

ERROR: InvalidIsbnException: ISBN-13: The ISBN 2884451313X contains invalid characters. / ISBN-10: The ISBN 2884451313X is not 10 digits long.
Traceback (most recent call last):
  File "I:\py\rewrite\pwb.py", line 239, in <module>
    if not main():
  File "I:\py\rewrite\pwb.py", line 233, in main
    run_python_file(filename, [filename] + args, argvu, file_package)
  File "I:\py\rewrite\pwb.py", line 111, in run_python_file
    main_mod.__dict__)
  File ".\scripts\cosmetic_changes.py", line 143, in <module>
    main()
  File ".\scripts\cosmetic_changes.py", line 136, in main
    bot.run()
  File "I:\py\rewrite\pywikibot\bot.py", line 1805, in run
    super(MultipleSitesBot, self).run()
  File "I:\py\rewrite\pywikibot\bot.py", line 1619, in run
    self.treat(page)
  File "I:\py\rewrite\pywikibot\bot.py", line 1906, in treat
    super(ExistingPageBot, self).treat(page)
  File "I:\py\rewrite\pywikibot\bot.py", line 1970, in treat
    super(NoRedirectPageBot, self).treat(page)
  File "I:\py\rewrite\pywikibot\bot.py", line 1833, in treat
    self.treat_page()
  File ".\scripts\cosmetic_changes.py", line 73, in treat_page
    changedText = ccToolkit.change(self.current_page.get())
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 279, in change
    new_text = self._change(text)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 273, in _change
    text = self.safe_execute(method, text)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 260, in safe_execute
    result = method(text)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 957, in fix_ISBN
    text, strict=False if self.ignore == CANCEL_MATCH else True)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 206, in _reformat_ISBNs
    text, lambda match: _format_isbn_match(match, strict=strict))
  File "I:\py\rewrite\pywikibot\textlib.py", line 1593, in reformat_ISBNs
    text = isbnR.sub(match_func, text)
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 206, in <lambda>
    text, lambda match: _format_isbn_match(match, strict=strict))
  File "I:\py\rewrite\pywikibot\cosmetic_changes.py", line 175, in _format_isbn_match
    scripts_isbn.is_valid(isbn)
  File "I:\py\rewrite\scripts\isbn.py", line 1376, in is_valid
    getIsbn(isbn)
  File "I:\py\rewrite\scripts\isbn.py", line 1344, in getIsbn
    % (e13, e10))
scripts.isbn.InvalidIsbnException: ISBN-13: The ISBN 2884451313X contains invalid characters. / ISBN-10: The ISBN 2884451313X is not 10 digits long.
<class 'scripts.isbn.InvalidIsbnException'>
CRITICAL: Closing network session.

Event Timeline

JAnD created this task.Aug 15 2015, 6:37 PM
JAnD raised the priority of this task from to Needs Triage.
JAnD updated the task description. (Show Details)
JAnD added a subscriber: JAnD.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptAug 15 2015, 6:37 PM
XZise added a subscriber: XZise.Aug 15 2015, 6:54 PM

Might be that the cosmetic changes/isbn script are now not able to handle the new exception classes due to T85240? On the other hand looking at _format_isbn_match it by default has a parameter strict which does reraise the exception.

XZise added a comment.Aug 15 2015, 6:56 PM

Add a -ignore:match. It's unfortunately not documented but that won't test ISBNs strictly. But I'm not closing this yet as it either needs to be documented better or the regex might need to be updated.

jayvdb updated the task description. (Show Details)Jan 25 2016, 1:51 PM
jayvdb set Security to None.
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptJan 25 2016, 1:51 PM
jayvdb updated the task description. (Show Details)Jan 26 2016, 12:25 AM
jayvdb reopened this task as Open.Jan 26 2016, 12:55 AM
jayvdb added a subscriber: jayvdb.

The -ignore option was added to cosmetic_changes on 13 Oct 2014 with f0543d67 (merged a day later)

Just noting that I tried to reproduce this, with master (with and without stdnum) and using 'git checkout 75e2578493~1' not using -ignore, without getting the expected error.

Which is a concern, as the ISBN 2884451313X is not valid.

http://www.isbn-check.de/checkisbn.pl?isbn=2884451313X&submit=test&lang=en says

ISBN 2884451313X was input with 11 digits (one digit too much)

And it recommends ISBN 2-88445-313-X (removing the 1) , which indeed is the book referred to on the Wiktionary page.