Page MenuHomePhabricator

Citoid: Generate ISBNs with hyphens
Open, NormalPublic

Description

According to en-wiki Manual of Style and probably others ISBNs should be hyphenated but Citoid only outputs a single number. Can the engine be modified to hyphenate ISBNs that are added to the cite templates?

Event Timeline

SoWhy created this task.Aug 7 2019, 6:41 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 7 2019, 6:41 PM
SoWhy renamed this task from Citoid: Generated ISBNs with hyphens to Citoid: Generate ISBNs with hyphens.Aug 7 2019, 6:41 PM
Mvolz triaged this task as Normal priority.Aug 8 2019, 7:59 AM
Mvolz added a subscriber: Mvolz.EditedAug 8 2019, 12:00 PM

So the short answer is - yes, that'd be ideal. But the ISBN situation is kind of a disaster though. The hyphenation position depends on knowing the length of other elements, like registration group, and these change so they need to be *constantly updated in order to correctly hyphenate them.*

I've checked out a few libraries that can hyphenate things but invariably there are false negatives, with more as the library gets older. Some dbs don't bother with the hyphens or only partially hyphenate (i.e. just do the three digit prefix), like amazon. This basically fractures the ability of isbns to be used as identifiers because if you want to positively identify a record you have to try every variation - with one space, one hyphen, all hyphens, no hyphens etc.

Most major databases have solved this problem by skipping the hyphens. WorldCat ignores them (example): https://www.worldcat.org/title/eyewitness-dvd-seashore/oclc/823473818 Amazon just puts one hyphen in for isbn-13s which is worse! Open library as well: https://openlibrary.org/books/OL24033933M/MediaWiki

So the question is, do we do this inconsistently - hyphenate where we can, only include isbns we can hyphenate, or return a mix of the hyphenated and ones with a valid check sum which we can't hyphenate - or very consistently return just the number with no hyphens.

@Lucas_Werkmeister_WMDE , @Samwalton9

This is a big issue for wikibase integration too; at present wikidata has a constraint that the isbn be hyphenated, but in reality some records lack the hyphenation despite the constraint. Thoughts? We need to pick one and stick with it.

Hm, that’s tricky… KrBot automatically fixes the format, at least sometimes (example edit), but I don’t know if it also does that in reference snaks (or would Citoid create new items for each ISBN in a reference, so that it would always be a main statement?), and I’m not sure if it’s acceptable for Citoid to make edits that need to be cleaned up by a bot later.

It’s not like this problem is exclusive to Citoid either – anyone else who wants to add ISBNs presumably has the same issue, and the inconsistent hyphenation (example query) also makes it more difficult to search/query for an item by ISBN.

Perhaps it would make sense to propose to the community (WikiProject Books?) that ISBNs should always be saved unhyphenated, and the display needs e. g. on Wikipedias can be served by storing the hyphenated version in a qualifier? (That doesn’t work for ISBNs used in qualifiers and references, but those aren’t common – see ISBN-13 query, ISBN-10 query.)

Change 529096 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/citoid@master] Dash placement for isbns

https://gerrit.wikimedia.org/r/529096

Mvolz moved this task from Backlog to Service on the Citoid board.Aug 8 2019, 1:56 PM

Change 529096 merged by jenkins-bot:
[mediawiki/services/citoid@master] Dash placement for isbns

https://gerrit.wikimedia.org/r/529096