Page MenuHomePhabricator

DOIs with unscrapeable pages are not merged with crossref / repository metadata
Closed, ResolvedPublic

Description

This is an issue when on enwiki you try to use Citoid in Visual Editor on Taylor & Francis such as on doi:10.1080/00288306.1980.10424125:

The expected result would be a journal citation similar to {{cite journal |last1=Brothers |first1=R. N. |last2=Heming |first2=R. F. |last3=Hawke |first3=M. M. |last4=Davey |first4=F. J. |title=Tholeiitic basalt from the Monowai seamount, Tonga-Kermadec ridge (Note) |journal=New Zealand Journal of Geology and Geophysics |date=July 1980 |volume=23 |issue=4 |pages=537–539 |doi=10.1080/00288306.1980.10424125 |ref=harv}}

What it actually produces is {{Cite web|url=https://www.tandfonline.com/action/captchaChallenge?redirectUrl=https%3A%2F%2Fwww.tandfonline.com%2Fdoi%2Fabs%2F10.1080%2F00288306.1980.10424125&|website=www.tandfonline.com|doi=10.1080/00288306.1980.10424125|access-date=2019-01-26}} as if it can't recognize the DOI correctly.

Event Timeline

That's definitely bad. We used to merge results if we had the doi with citoid results but we don't do that with a zotero ones; we should either merge with zotero results, just return the crossref one, or use zotero's search input which I assume handles this better.

Mvolz triaged this task as High priority.

Change 486861 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/citoid@master] [WIP] User zotero search for dois

https://gerrit.wikimedia.org/r/486861

Change 486861 merged by jenkins-bot:
[mediawiki/services/citoid@master] Use zotero search endpoint for dois

https://gerrit.wikimedia.org/r/486861

Mvolz renamed this task from Taylor & Francis DOI is not correctly recognized by Citoid to DOIs with unscrapeable pages are not merged with other metadata.Feb 11 2019, 10:08 AM
Mvolz edited projects, added Regression; removed Patch-For-Review.
Mvolz moved this task from Backlog to Waiting on Deploy on the Citoid board.

FYI this is a regression caused by T197242. The issue is that a lot of scraping that used to be done by citoid is now done by zotero, but we were handing zotero just the url and not the doi and so the info from the doi wasn't getting merged with what was scraped on the page.

Mvolz renamed this task from DOIs with unscrapeable pages are not merged with other metadata to DOIs with unscrapeable pages are not merged with crossref / repository metadata.Feb 11 2019, 10:36 AM

Mentioned in SAL (#wikimedia-operations) [2019-02-11T21:09:15Z] <mobrovac@deploy1001> Started deploy [citoid/deploy@0b91bea]: Use Zotero for DOIs and pass it the A-L header - T214766 T210806 T215755

Mentioned in SAL (#wikimedia-operations) [2019-02-11T21:13:03Z] <mobrovac@deploy1001> Finished deploy [citoid/deploy@0b91bea]: Use Zotero for DOIs and pass it the A-L header - T214766 T210806 T215755 (duration: 03m 47s)

Both DOIs seem to be working in production but it appears I forgot to write a test for this, so I'll re-open until that's merged. :).

Change 492979 had a related patch set uploaded (by Mvolz; owner: Mvolz):
[mediawiki/services/citoid@master] Add test

https://gerrit.wikimedia.org/r/492979

Change 492979 merged by jenkins-bot:
[mediawiki/services/citoid@master] Add test

https://gerrit.wikimedia.org/r/492979