Page MenuHomePhabricator

Don't split names in citoid by default
Closed, ResolvedPublic

Description

Testing my own blog site (blog page) I noticed an assumption about name formatting that isn't universally correct

[{
  "key":"SZGR9VUU",
  "version":0,
  "itemType":"webpage",
  "tags":[],
  "title":"Rate Encoded Location",
  "websiteTitle":"AI stuff",
  "url":"https://jeblad.github.io/neural%20nets/2018/03/10/rate-encoding",
  "abstractNote":"Outline of rate encoded locations, how the brain might do it, and a possible approximation in an artificial neural net.",
  "language":"en",
  "accessDate":"2019-03-12",
  "author":[["John Erling","Blad"]],
  "source":["Zotero"]
}]

Notice the author field. At the site this is encoded as "John Erling Blad", but in the JSON structure "Blad" is split into a separate field. That is "Blad" is family name. In other languages this does not hold in general. For example, for w: Eggert Ólafsson the part "Ólafsson" is not a family name but a w:patronym. In such cases the name should probably not be split, that is it is not a simple (first name, family name) pair.

I have no simple quick-fix for this, except if there are no commas in the extracted names, that is they are not inverted, then they should probably not be split into (first name, family name) pairs.

Event Timeline

jeblad created this task.Mar 12 2019, 4:30 PM
Restricted Application added subscribers: Danmichaelo, Aklapper. · View Herald TranscriptMar 12 2019, 4:30 PM
jeblad updated the task description. (Show Details)Mar 12 2019, 4:33 PM
Mvolz renamed this task from Citoid: Patronym in author field mistaken as family name to Don't split names by default.Mar 13 2019, 11:01 AM
Mvolz triaged this task as Normal priority.
Mvolz added a subscriber: Mvolz.Mar 13 2019, 11:11 AM

Thanks; I've been thinking about doing this for a while, actually. The zotero format which is the native internal format has separate fields for first name and last name, but it is indeed a bad assumption. Unfortunately since this is coming from Zotero I'm not sure we can fix it on our end in this particular case except by trying to reconstruct the split, or overwriting it maybe?

Might be something to bring up in this thread? https://github.com/zotero/translators/issues/1092

Or bring up once that is merged?

Mvolz renamed this task from Don't split names by default to Don't split names in citoid by default.Mar 13 2019, 11:11 AM
Mvolz moved this task from Backlog to Service on the Citoid board.Mar 18 2019, 12:51 PM

Change 497315 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/citoid@master] Don't split authors by default

https://gerrit.wikimedia.org/r/497315

Change 497315 merged by jenkins-bot:
[mediawiki/services/citoid@master] Don't split authors by default

https://gerrit.wikimedia.org/r/497315

Mvolz moved this task from Service to Waiting on Deploy on the Citoid board.Mar 22 2019, 10:10 AM
Mvolz closed this task as Resolved.EditedMon, Apr 1, 1:50 PM

Citoid no longer split names by default. But in the particular case you mention, you'll still see the same result, as zotero does this and I think it's not possible to change their minds about this as default behaviour, so I guess in that sense it's also partially declined...