Page MenuHomePhabricator

Citoid identifies non-DOI URLs as DOIs
Closed, ResolvedPublic

Description

Feed Citoid the URL http://g2014results.thecgf.com/athlete/weightlifting/1024088/dika_toua.html. This is the profile page of a Commonwealth Games athlete, but Citoid recognizes it as a journal article with the DOI 1024088/dika_toua.html.

This is because it matches the regex at CitoidService.js, line 67. However, I'm not sure why this passes the resolver check further down, because dx.doi.org rejects the URL with the error "DOI Prefix [1024088] Not Found".

Event Timeline

nshahquinn-wmf raised the priority of this task from to Needs Triage.
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf added a project: Citoid.
nshahquinn-wmf subscribed.
Mvolz triaged this task as High priority.

Change 232259 had a related patch set uploaded (by Mvolz):
[WIP] Improve DOI detection

https://gerrit.wikimedia.org/r/232259

Change 232259 merged by Mobrovac:
Improve DOI detection

https://gerrit.wikimedia.org/r/232259

I see that this patch has been committed. I'm looking forward to this fix being put in production, as I am fixing a few of these errors daily at en.wp.

See also https://en.wikipedia.org/wiki/Help:CS1_errors#Check_.7Cdoi.3D_value

In the Citation Style 1 module on en.wp, we check for spaces and en dashes in DOIs, both of which are invalid characters. Citoid could optionally add this check.

mobrovac subscribed.

Deployed, resolving.

@Jonesey95 please open a separate task for spaces and dashes in DOIs.