Page MenuHomePhabricator

Session: ScienceSource scripting for focus list additions
Closed, ResolvedPublic

Description

For the purposes of the ScienceSource project (https://meta.wikimedia.org/wiki/Grants:Project/ScienceSource/Profile) and its Wikidata focus list of biomedical articles (shortcut WD:SSFL on Wikidata), the development of some scripts would be very helpful.

A normal starting point would be a list of DOIs. One obvious question is how, starting with a Wikipedia page, to extract a list of the DOIs of papers used in its references. Assuming those occur in the standard template field doi =, this is an easy problem to solve. A list of DOIs might also occur in results from a SPARQL query on Wikidata, which is much less trouble.

The next question is how to translate DOIs into Wikidata items. At present the DOIs arising from Wikipedia should, almost always, have a matching Wikidata item, but of course that cannot be guaranteed. The item can be “looked up”, and there is more than one way to do that, if it is also true that basically SPARQL is used behind the scenes

The Hub tool can be applied to convert DOIs systematically: from a DOI such as

10.1126/SCIENCE.1143609

create the prefixed version

https://tools.wmflabs.org/hub/P356:10.1126/SCIENCE.1143609

by applying P356, the property number of DOI. This URL then resolves to

https://www.wikidata.org/wiki/Q22065869

and provides the item number Q22065869.

What would be really helpful for the aims of ScienceSource would be to factor in also the PublicationTypeList information available through the PubMed ID (P698) statement commonly available on items for biomedical articles. If this says more than “Journal Article”, the publication type should be added to the item, as directly relevant to the usefulness of article for medical referencing.

Additions of statements can be done by means of the QuickStatements bot. A script-led workflow should end up with output that can be sent to QuickStatements.

Event Timeline

Charles_Matthews renamed this task from ScienceSource scripting for focus list additions to Session: ScienceSource scripting for focus list additions.Jul 11 2018, 3:01 PM

@Charles_Matthews: Anything left to do in this Wikimania-Hackathon-2018 task / any follow-up tasks / documentation still needed to create or link to? If there is nothing left to do here, please feel free to resolve this task via the Add Action...Change Status dropdown. Thanks!

In the end, ScienceSource concentrated on PubMed IDs, and has not been involved with DOIs. Resolved.

The publication type "review" as drawn in from PubMed search, together with retraction notices, is what is used in the NCBI2wikidata tool (https://github.com/ContentMine/NCBI2wikidata). And the focus list is currently being built up in a segmented way, using P1995 on Wikidata. Details are on the pages https://www.wikidata.org/wiki/User:Charles_Matthews/NCBI2wikidata and https://www.wikidata.org/wiki/User:Charles_Matthews/ScienceSourceIngest.