Page MenuHomePhabricator

Create new {{#isbn}} parser function
Open, Needs TriagePublic

Description

As described in the parent task, this avoids the need for every wiki to create a Template:ISBN containing [[Special:Booksources/{{{1}}}|ISBN {{{1}}}]]

Details

Related Changes in Gerrit:

Event Timeline

Change #1220386 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Add {{#isbn}} parser function

https://gerrit.wikimedia.org/r/1220386

@PerfektesChaos in the parent task you mentioned wanting to have an optional second argument to indicate "invalid ISBN" and I'd like to hear more about how you think this should work. Can we still enforce that the "invalid ISBN" should only contain digits? Is it the # of digits that is bogus? Ie, I'd really like to avoid {{#isbn:<script>alert("boo")</script>}} being a valid invocation. Does linking to Special:Booksources still make sense for invalid ISBNs? Should there be a class or other indication that the ISBN is invalid?

I'm inclined to ship the 1-argument {{#isbn}} first, and then add the two-argument form as a patch on top iff we can decide what that additional functionality should be.

@cscott: It seems to me it would make much more sense to put this in MW extension rather than foist it upon every use of MW by adding it to core. And while you are at it perhaps extract the Special:Booksources code and put it in the same extension. Sure most of the MWF site might end up using such an extension but not every MW is about or even concerned about published books.

I maybe confused here but I believe the intention of in valid ISBNs is to support those that are misquoted on published books. Typically this means incorrectly quoted digits. Valid ISBNs must have a valid check digit. To this end supported invalid ISBNs should probably still be either an ISBN10 or ISBN13 and thus after removing dashes (or other supported interpunctuation) should be either 9 decimal digit serial with a decimal digit or x or X check digit or 12 decimal digit serial with a decimal digit check digit. I believe supported invalid ISBNs should still need to be valid in their format even if the actual check digit algorithm fails.

On invalid ISBN:

  • The problem is in the era of manually computed checksum from 1970 until 2006 with ISBN-10.
  • They tried to calculate the checksum, based on publisher ID and internal serial number, but failed to catch the correct figure. Even electronic calculators were rare.
  • The bad ID was printed into the book.
  • The (national) libraries registered the data, typing a paper card and copied the bad number.
  • In library catalogs, you could find the book by bad ISBN, but with no entry for a “corrected” ISBN. Even worse, nobody but the publisher knows which one is the bad digit, usually miscomputed checksum, but the printer might have shifted the digit sequence.
  • Therefore it is necessary to arrive at Special:Booksources with a formally invalid ISBN, but not issuing an error message when transcluding {{#isbn}} in wikitext.
  • If there is no suppress error message in parser function, a message and tracking category shall be issued.
  • By ISBN-13 since 2007 and electronic accounting systems almost no errors are known.

@Uzume I don't disagree, but given that Special:Booksources is in core, and magic links are in core, I don't have a problem with adding {{#isbn}} to core at this time as well. I agree that it would be better for this functionality to move into an extension at a future date, but we need to make more progress on T145604: RFC: Future of magic links first. At the moment, almost all production wikis have the ISBN magic link turned on, which then requires Special:Booksources to be enabled as well.

@PerfektesChaos The core ISBN magic link functionality does not do any validity checking on the check digit, so I don't see why the 1-parameter {{#isbn}} should do so either. I suppose that folks have a Template:ISBN of some sort which is doing validity checking, perhaps? It doesn't appear to be in mediawiki-core, unless I'm missing something.

I do expect that an {{#isbn:}} transcluded in wikitext will check validity of mandatory first parameter.

  • If optional second parameter is not 1 and the value is invalid, I want a maintenance category thrown and a small hint visible following every complaint:
<span class="error mw-isbn-invalid">[?]</span>

Parameter 1 is permitted to contain any number of spaces, dots, hyphens. Those will be stripped, and the remaining 10 or 13 ^[0-9]+[xX]?$ will test check digit, and hyphenate link text with nowrap if valid.

@PerfektesChaos I'm going to focus on reproducing the core magic link functionality first, which does not have these validity-checking features. I think the abilities you describe are useful, but are also reasonable things to put in a Scribunto module, especially once you consider that various wikis might want to localize or style the "invalidity hint" differently. Right now the ISBN magic word respects the authors styling and any non-digit formatting is left alone; you're also proposing to reformat the ISBN -- which again has some merit, but can be built on top of the basic #isbn parser function.

As I understand it, ISBNs in citations are also involved in some of our efforts to combat misinformation and identify AI-generated edits, since LLMs tend to hallucinate non-existing sources, with non-existing ISBNs. As I understand it, the Internet Archive has a bot which doesn't rely on verifying the check digit (since the check digit can be wrong in legitimate sources, for reasons described above) but actually consults its database of known published books to try to establish validity of a source. I'd rather not try to redo their work poorly, but instead focus on providing the core support and letting others build good tools (check digit verification, reformatting, etc) on top of it.

to put in a Scribunto module

That would be required to be distributed to >1000 wikis. And maintained everywhere forever.

When updating methodology from 2005 magic to 2026/2027 this should be pushed inside WMF and outside as well. Lua is not appropriate for a generic task.

I do volunteer as consultant for the algorithm of that new parser function. I wrote a Module with a package of URI identifiers and ISBN data a dozen years ago.

Checksum test etc. was used to detect human errors long before LLM arrived.

On AI hallucination I do expect this as a temporary observation from 2024/2025, looking at the first steps of generating texts with citations by throwing dice for a random ISBN. I am quite sure that software will be improved by looking into quality journals and GoogleBooks where the title or abstract is containing the keywords of the statement. Then, they will add a correct reference to valid paperwork which is somehow dealing about this topic. Without real understanding what is meant there. To be honest, I do know students and Wikipedia authors who do exactly the same for several decades just by GoogleSearch result list.

BTW, there is already code in current MW which can be shared:

  • Magic parser does perform correct rendered hyphenation, but is expecting hyphens or no separator.
  • Booksources special page already does a validity check.

When receiving a string as parser function parameter, that string might contain any space, dot or hyphen, since the code is not terminated by magic parsing rules but as long as the string parameter might be.

On AI hallucination I do expect this as a temporary observation from 2024/2025

See also https://media.ccc.de/v/39c3-ai-generated-content-in-wikipedia-a-tale-of-caution

Change #1220386 merged by jenkins-bot:

[mediawiki/core@master] Add {{#isbn}} parser function

https://gerrit.wikimedia.org/r/1220386

Adding user notice; this should go out on the next MW train (week of Jan 26) and then should go in Tech News for the week after that (Feb 2) so that folks can try it out when they read about it.

@PerfektesChaos I'd be interested in supporting you on a Scribunto module / MediaWiki extension to more thoroughly deal with ISBNs. As folks mentioned above, Special:Booksources could probably be moved out of core as well, so pulling this functionality plus a Scribunto module for ISBN handling (which can be distributed via MediaWiki extension) might be a good idea. Alternatively, we are doing work on making distribution of Scribunto modules easier (T411834: Scribunto external dependencies - roadmap and requirements) and this could be a good initial use for that.

T411834 seems to be describing some pretty outlandish concepts with an unclear roadmap to ever being completed. The reality on the ground right now is that shared Lua modules get unsynchonised fairly quickly because either of conflicting changes, not having a shared repository and not propagating changes between them. There are some tools to help with that but I see no issue with @PerfektesChaos’s opinion that this new magic word should validate ISBNs in some way (I don’t have a problem with it being an extension) if it exists. The problem with a shared Lua module approach is that eventually you’ll run into the same problem as T343131: Commons database is growing way too fast, in the case of a module, both for the module itself and for the wrapping template (since most wikis are not bad enough to include modules without a template). ISBNs are highly used on multiple wikis so that’s a real possibility.

How should we word this for Tech News? @cscott

Adding user notice; this should go out on the next MW train (week of Jan 26) and then should go in Tech News for the week after that (Feb 2) so that folks can try it out when they read about it.

I guess T148274: Implement a convenient way to link to ISBNs without magic links should be duped into this issue now?

How will/does VisualEditor handle the new parser function?

How will/does VisualEditor handle the new parser function?

The same way it handles all parser functions: it does as if it was a template named #isbn:963-12-3456-7 with zero parameters, making the ISBN itself uneditable. ☹ Do we have a generic task for implementing support for parser functions in VisualEditor? It has so many sub-projects that I have no idea where to look for it.

How will/does VisualEditor handle the new parser function?

The same way it handles all parser functions: it does as if it was a template named #isbn:963-12-3456-7 with zero parameters, making the ISBN itself uneditable. ☹ Do we have a generic task for implementing support for parser functions in VisualEditor? It has so many sub-projects that I have no idea where to look for it.

You could just edit properly and not use VisualEditor.

You could just edit properly and not use VisualEditor.

Go troll somewhere else.

How will/does VisualEditor handle the new parser function?

The same way it handles all parser functions: it does as if it was a template named #isbn:963-12-3456-7 with zero parameters, making the ISBN itself uneditable. ☹

Yeah, that's why I was asking. Code wise it would be rather straightforward to have a bot swap magic links to the parser function but if there's no VE support that feels like a big UX regression.

Do we have a generic task for implementing support for parser functions in VisualEditor? It has so many sub-projects that I have no idea where to look for it.

A combo of T55414: TemplateData: Ship documentation for core magic words and parser functions + T52855: VisualEditor: Transclusions editor should support parser functions and variables I think. But given that VE already has specialized handling for ISBNs through magic links, it doesn't seem that unreasonable to me to have dedicated handling for #isbn.

You could just edit properly and not use VisualEditor.

Go troll somewhere else.

+1. VE is also a proper editor.

Do we have a generic task for implementing support for parser functions in VisualEditor? It has so many sub-projects that I have no idea where to look for it.

A combo of T55414: TemplateData: Ship documentation for core magic words and parser functions + T52855: VisualEditor: Transclusions editor should support parser functions and variables I think.

Thanks for looking up these tasks!

But given that VE already has specialized handling for ISBNs through magic links, it doesn't seem that unreasonable to me to have dedicated handling for #isbn.

One major driver for magic links deprecation is to avoid specialized handling, so I wouldn’t handle #isbn more specially than other parser functions. If parser functions get support for specialized editors (similarly to how extension tags can have their own specialized popups), #isbn could also get its own, but if/as long as there’s no such generic support, I wouldn’t make an exception for #isbn.