Page MenuHomePhabricator

Expand pubmed regex to include digits starting from 1 in citoid
Closed, ResolvedPublic0 Estimated Story Points

Description

Pubmed IDs apparently start at 1:
https://www.ncbi.nlm.nih.gov/pubmed/1

However, we currently only recognise a pubmed id if it is 6 or 7 digits long. We should expand the definition in

CitoidService.prototype.addResponseFunction (https://github.com/wikimedia/citoid/blob/master/lib/CitoidService.js) to include these.

Getting Started

citoid is a Node.js application (written in JavaScript) that retrieves information about a webpage, book, journal article, etc. given a URL to the webpage or some other identifier, like DOI (digital_object_identifier). It uses another open source project, Zotero's translation-server, also written in JavaScript, to do a lot of the work.

In order to get citoid working on your computer, you'll need to download both Node version 10.0 (for citoid) and xpcshell version 29.0 (for Zotero) to get both of them working. There are installation instructions and more information available at https://www.mediawiki.org/wiki/Citoid

(For this task, you don't strictly need to install Zotero, but the tests will fail without it installed, so you will need to install it to create the tests.)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mvolz renamed this task from Expand pubmed regex to Expand pubmed regex in regex to include digits starting from 1.Jan 9 2017, 2:57 PM
Mvolz updated the task description. (Show Details)
Mvolz renamed this task from Expand pubmed regex in regex to include digits starting from 1 to Expand pubmed regex to include digits starting from 1.Jan 9 2017, 2:58 PM
Mvolz renamed this task from Expand pubmed regex to include digits starting from 1 to Expand pubmed regex to include digits starting from 1 in citoid.Jan 9 2017, 3:04 PM
Mvolz updated the task description. (Show Details)

Change 331443 had a related patch set uploaded (by Sn1per):
Interpret strings of digits between 1 and 8 digits long as PMIDs

https://gerrit.wikimedia.org/r/331443

From the commit message,

"Side effect is that PMCIDs without prefixes are indistinguishable from
and considered PMIDs. Unit tests have been updated accordingly."

Hmm, I hadn't thought about that! This is definitely the preferred behaviour from the previous behaviour, however, we may consider trying and returning both citations if both are found. However, that makes this change much more architecturally complex because instead of handing it off to one function, we'll need to hand it to multiple functions, and concatenate the results, something I think is beyond scope here. Maybe that should be done as part of T115248 as well.

@mobrovac

This comment was removed by Mvolz.

Agreed @Mvolz . If we can, we should return both, but it's a bit out of scope here. Perhaps create a new ticket that would take that into account for both PM(C)IDs as well as OCLC?

Change 331443 merged by jenkins-bot:
Interpret strings of digits between 1 and 8 digits long as PMIDs

https://gerrit.wikimedia.org/r/331443

Deployed, resolving.