Page MenuHomePhabricator

Zotero is duplicating name of author in some requests
Open, MediumPublic

Description

Today, German Wikipedia removed >700 duplicated author names.

  • Bot run for further investigation and backtracing,

It comes up with a comma-separated list of the same whatever twice, like here.

Same issue for printed material, but not yet cleaned up.

  • Testing material for detection by template and throwing maintenance category.

Example:

curl -d 'https://economics.nd.edu/faculty/william-evans/' -H 'Content-Type: text/plain' http://127.0.0.1:1969/web

returns

`
[{"key":"Y972DMCW","version":0,"itemType":"webpage","creators":[{"firstName":"Marketing Communications: Web University of Notre","lastName":"Dame","creatorType":"author"},{"firstName":"Marketing Communications: Web University of Notre","lastName":"Dame","creatorType":"author"}],"tags":[],"title":"William - Evans Department of Economics University of Notre Dame","websiteTitle":"Department of Economics","url":"https://economics.nd.edu/faculty/william-evans/","abstractNote":"Notre Dame's Department of Economics offers graduate and undergraduate degrees, and its faculty specializes in microeconomics and macroeconomics theory; econometrics; and labor, monetary, international, development, and environmental economics.","accessDate":"2021-03-15T14:13:46Z"}]
`

Upstream report: https://github.com/zotero/translators/issues/2354

Event Timeline

Ah, and two similar resolved issues wrt author, might help:

  • T160845 wrong author listed and wrong first/last name for the one author listed using ISBN lookup
  • T203361 Publisher is incorrectly parsed as an "author full name" and split into first and last name when generating from DOI
Mvolz renamed this task from Citoid/VE is duplicating name of author to Zotero is duplicating name of author in for requests.Mar 15 2021, 2:15 PM
Mvolz renamed this task from Zotero is duplicating name of author in for requests to Zotero is duplicating name of author in some requests.
Mvolz triaged this task as Medium priority.
Mvolz updated the task description. (Show Details)
Mvolz moved this task from Backlog to Zotero on the Citoid board.

It looks like from these examples it's bad metadata in the source.

I had a look at the one here: view-source:https://economics.nd.edu/faculty/william-evans/

It looks like, actually, the metadata is repeated twice in that page: there's like a partial head down there at the bottom with no closing tag... The html is formed badly.

For view-source:https://www.swr.de/wissen/1000-antworten/kultur/woher-kommt-redensart-ueber-die-wupper-gehen-100.html
The metadata is like this:

<meta name="author" content="SWRWissen">
<meta property="author" content="SWRWissen">

Both of these are valid, syntax, so it reads it as two different authors...

We could just make sure to check there are no duplicates before sending it off in citoid - I've also filed a bug with Zotero to see what they think, if it should be handled by the translator: https://github.com/zotero/translators/issues/2354

Yeah, I guessed something like that from a few examples, but they had various causes.

It is not an issue of one particular translator for one specific website, but in general Zotero and/or Citoid should check the list of propagated authors, editors, publishers or whatever, whether this text is already present. If found, duplicates should be removed. Finally they may concatenated and resolved to a list of locations, authors or anything else.