Page MenuHomePhabricator

wrong author listed and wrong first/last name for the one author listed
Open, NormalPublic

Description

citoid has wrong author listed and the wrong first/last name for the one author listed.

http://web.archive.org/web/20170318230352/https://www.worldcat.org/title/black-artists-of-the-new-generation/oclc/886799569

vs.

https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/0396074340

worldcat's first author listed doesn't appear in citoid output at all. second author listed in this case happens to be the foreward author not an author of the work proper

(via https://en.wikipedia.org/wiki/User:Versary19/sandbox )

"It is pulling dates into last name, putting the publisher as "other" and I don't know what else. This was reported by the Amon Carter Museum -- at first I thought it was just a weird record because you can get those but I had the same results with others. Also missing publication date."

Error occurs with:
9780995555563
0511040938
9780838916322

Event Timeline

jeremyb created this task.Mar 19 2017, 12:07 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mvolz moved this task from Backlog to Zotero on the Citoid board.Mar 20 2017, 10:53 AM

An update about this issue.
Using the newly released ISBN citation feature, we discovered that the first name/last name problem is still aroud.
How to reproduce:

  • insert the ISBN "9780123850591" (Computer networks by Larry L Peterson and Bruce S Davie)
  • the template is getting "S." as last name and "Davie, Bruce" as first name

https://en.wikipedia.org/api/rest_v1/data/citation/mediawiki/9780123850591

Exporting the citation from worldcat https://www.worldcat.org/title/computer-networks-a-systems-approach/oclc/781227361 generates correct data, the authors in the download file are:

  • Peterson, Larry L.
  • Davie, Bruce S.

@Mvolz this isn't even triaged, and yet it seems to affect every wiki, making ISBN ref generation a bit lame?

Mvolz added a comment.Nov 28 2017, 3:51 PM

@Mvolz this isn't even triaged, and yet it seems to affect every wiki, making ISBN ref generation a bit lame?

I looked into this a extensively when I did T155161, hoping that adding another data format would fix the problem.

Unfortunately the data is very, very inconsistent, and actually worse in some ways in the MarcXML than the DublinCore format. They simply don't give us the data in a consistent and structured way from the API. From record to record it's often not even in the same field.

I agree the data looks okay from the worldcat website. I wonder if they have an internal data format that we don't have access to. Unfortunately we're only able to access their data in MarcXML and DublinCore and both of those formats have some significant flaws.

Maybe with some more sophisticated natural language processing we could do a little better but probably not 100% (particularily concerning the foreward author issue) :/

I can look into it again at some point.

Mvolz triaged this task as Normal priority.Nov 28 2017, 3:52 PM
Mvolz removed a project: Internet-Archive.

I agree the data looks okay from the worldcat website. I wonder if they have an internal data format that we don't have access to. Unfortunately we're only able to access their data in MarcXML and DublinCore and both of those formats have some significant flaws.

Thanks for the detailed reply. We do have contacts there, though? Can we maybe ask them what's going on? :)

Dalba awarded a token.Apr 14 2018, 8:12 AM
Dalba added a subscriber: Dalba.

Hi, this seems to be working differently than when launched. For example the ISBN 9788611177434 which is used in the WMF blog post https://blog.wikimedia.org/2017/05/11/wikimedia-oclc-partnership/ no longer works and in fact throws up an error. Have you worked with Karen Combs at OCLC? If not I can put you in touch.

Elitre added a comment.May 4 2018, 1:50 PM

Hi, that ISBN works for me at en.wp. Error messages can happen - you can fix them before or after saving. https://en.wikipedia.org/wiki/Help:CS1_errors#bad_date is the guide to fix the one that this specific source is giving me.

When I go back to edit the resulting citation it seems to have saved it as
a text block and not as a book citation with the various fields?

Elitre added a comment.EditedMay 4 2018, 5:07 PM

<s>Possibly an issue with that specific ISBN? A different one generated https://en.wikipedia.org/w/index.php?title=User%3AElitre_%28WMF%29%2Fsandbox&type=revision&diff=839628277&oldid=839601708 for me. </s>

Thankfully I have smarter colleagues who figure stuff out for me.

@Merrilee, that ISBN is for an audio recording, which (according to https://en.wikipedia.org/wiki/MediaWiki:Citoid-template-type-map.json ) is supposed to use the {{citation}} template rather than the {{cite book}} template. (It's strange that the blog post mentions this ISBN in text, but shows a different one in the image.)

The date error is generated by the local CS1 template, because "cop. 2007" is not a date format that it recognizes.

Mvolz claimed this task.Jul 1 2018, 10:26 AM
Mvolz updated the task description. (Show Details)
Mvolz moved this task from Zotero to Service on the Citoid board.Jan 4 2019, 11:03 AM
Izno added a subscriber: Izno.Mar 15 2019, 5:20 PM

Change 497315 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/citoid@master] Don't split authors by default

https://gerrit.wikimedia.org/r/497315

Change 497315 merged by jenkins-bot:
[mediawiki/services/citoid@master] Don't split authors by default

https://gerrit.wikimedia.org/r/497315

Mvolz added a comment.Apr 2 2019, 9:13 AM

I've now deployed a fix that no longer splits the authors, and just puts them all in the "last" field. So in terms of how it's rendered with most citation templates, it looks a lot better. However this is a bit hacky, so I'm leaving this open for right now. It will continue to leave in parts of the author name that we don't necessarily want, like the birth date in parens.