Page MenuHomePhabricator

Enable and use or merge results from zotero ISBN search to improve ISBN results
Closed, DeclinedPublic

Description

Steps to reproduce

  1. Open WorldCat or Zotero
  2. Search for ISBN 9780198029359
  3. Open https://cs.wikipedia.org/api/rest_v1/#!/Citation/getCitation (or use Citoid in VE)
  4. Search for ISBN 9780198029359

Expected behavior
The records from WorldCat or Zotero should match what REST or Citoid outputs, right? The records in Zotero seems to be correct, the result from WorldCat has some issues.

Current behavior
The REST output (Citoid output) is a mess. There is year in author last name field, there is WorldCat url in e-book url field, there is some english text in numPages field (bad on non-english Wikipedia). What is wrong with this?

Configuration
cswiki

Issue

We currently use worldcat's open search API for this metadata and the results we get are only available in marcxml which is probably where that junk is come from and dublincore which is also pretty messy and not too structured. The results on their website come from a database we don't have access to. I could probably improve it a bit on our end but it might worth working towards fixing zotero instead because they query more databases, or merging the results from both somehow.

Event Timeline

I'm not sure what version of Zotero you're using, but unfortunately in translation-server which is what we use, this feature seems to be broken, filed here: https://github.com/zotero/translation-server/issues/79

We currently use worldcat's open search API for this metadata and the results we get are only available in marcxml which is probably where that junk is come from and dublincore which is also pretty messy and not too structured. The results on their website come from a database we don't have access to. I could probably improve it a bit on our end but it might worth working towards fixing zotero instead because they query more databases, or merging the results from both somehow.

Mvolz renamed this task from Both Zotero and WorldCat records seems correct, but Citoid generates mess from ISBN to Enable and use or merge results from zotero ISBN search to improve ISBN results.Jan 28 2019, 1:44 PM
Mvolz changed the task status from Open to Stalled.
Mvolz triaged this task as Medium priority.

Maybe a dupe of T160845

Change 486851 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/citoid@master] [WIP] Remove xISBN and replace with zotero

https://gerrit.wikimedia.org/r/486851

...or merging the results from both somehow.

Combining multiple sources would be the best option, but maybe too internet-consuming. Possibly a good idea is also to try some local authorities (per T212585 and subtasks) first.

Change 486851 merged by jenkins-bot:
[mediawiki/services/citoid@master] Remove xISBN and replace with zotero

https://gerrit.wikimedia.org/r/486851

@Dvorapa - Merging the two results sounds like a good idea in theory, but I suspect would be a bad idea in the long run. It would cause the look-up to take longer and potentially lead to more bugs and difficulty tracking down bugs. For example if one source lists a person as an author and another source lists them as an editor (as there sometimes isn't a clear distinction between author and editor), you would end up with the same person being listed as both. I think we should decide on a single provider for ISBN data and make sure our code accommodates it well.