Page MenuHomePhabricator

use external package for isbn.py
Closed, ResolvedPublic

Description

The ISBN validation data in isbn.py have not been updated since September 2007. The ISBN ranges are updated regularly, meaning the isbn.py script is currently wrong. Rather than keeping this data up to date, the algorithm in isbn.py should be replaced with an external package which provides the same functionality and is py3 compatible, packaged on pypi, and is regularly updated as the ISBN ranges are issued and modified.

There are several such packages on pypi. See
https://pypi.python.org/pypi?%3Aaction=search&term=isbn&submit=search

The most useful is
https://pypi.python.org/pypi/python-stdnum

Also up to date:
https://github.com/nekobcn/isbnid

https://pypi.python.org/pypi/isbnlib / https://pypi.python.org/pypi/isbntools

https://pypi.python.org/pypi/isbn_hyphenate

Ideally isbn is able to use whichever library is installed.

Event Timeline

jayvdb raised the priority of this task from to Needs Triage.
jayvdb updated the task description. (Show Details)
jayvdb added a project: Pywikibot-Scripts.
jayvdb subscribed.

As the ISBN data in this script is very outdated, I dont see the benefit in keeping it. However, removing the old code/data could become problematic , especially if you dont provide perfect backwards compatibility. So, dont remove it in your changeset, otherwise your changeset might be hit with a -2. If you leave the old code in the file, unmodified, it can still be accessed by other scripts which might depend on it via an import .

If the old code can be easily used as a fallback if no external libraries exist, that would be nice. (but optional if it isnt easy to do)

After your changeset adding support for external libraries, another changeset can then remove all the old code/data, and then it doesnt matter if someone hits it with a -2.

Change 184225 had a related patch set uploaded (by Murfel):
External isbn packages support

https://gerrit.wikimedia.org/r/184225

Patch-For-Review

Given that the script uses a library, it can say for sure whether an isbn is correct and add Template:Please_check_ISBN or Template:Listed Invalid ISBN if it's not, so it's going to be more efficient.

IMO {{Listed Invalid ISBN}} should only ever be added by a human, and hopefully a bibliophile, as "listed" means it appeared in print on the physical item if it had a physical item, or nearest equivalent for born-digital items. fwiw, I created that template 8 years ago; how time flies. Interesting to see it now has 650+ uses on en.wp, and has been translated into 14 other languages. ;-)

Change 184225 merged by jenkins-bot:
External isbn packages support

https://gerrit.wikimedia.org/r/184225

The next step for this bug is to:

  1. add python-stdnum as a dependency in setup.py
  2. add tests for cosmetic_changes use of isbn
  3. determine what to do with the ISBN rule in fixes.py - i.e. is it useful

The next step for this bug is to:

  1. add python-stdnum as a dependency in setup.py

Done, as an optional dependency

  1. add tests for cosmetic_changes use of isbn

Done.

  1. determine what to do with the ISBN rule in fixes.py - i.e. is it useful

Outstanding.

Review of ISBN cosmetic_changes vs fixes is good first task.

fixes ISBN normalises the ISBN syntax, and cosmetic change ISBN validates the ISBN and reformats it some more. definitely some overlap, but not as bad as first thought. That overlap can be addressed in T76323.