Page MenuHomePhabricator

add DOI URI support
Closed, DeclinedPublic

Description

Author: mark_sweep

Description:
Consider adding support for Digital Object Identifiers in wikitext. This would
be done by supporting the "doi:" URI scheme in addition to "http:", which is
already treated specially.

For example, a string "doi:10.1000/186" in wikitext would be transformed into
"<a href="http://dx.doi.org/10.1000/186">doi:10.1000/186</a>" in HTML.

A few WP articles use DOIs in references -- e.g.
[http://en.wikipedia.org/wiki/Kurtosis] -- by using standard external links.

Here is an overview of the DOI URI representation:
[http://www.doi.org/factsheets/DOIIdentifierSpecs.html]

The specification of DOI names is here:
[http://www.doi.org/handbook_2000/appendix_1.html#A1-4]. As far as specs go,
this one is a bit vague. I'm not sure I understand how to recognize the end of
a DOI. It says a DOI consists of "legal *graphic* characters of Unicode 2.0 or
greater" (emphasis added). Not sure if the graphic characters include
whitespace, or even if "graphic character" is a meaningful term defined in
Unicode. I've never seen DOIs with embedded whitespace. Also, a DOI might have
to be URL encoded when it's converted to a http: URI. I don't have any test
cases for that, sorry.


Version: unspecified
Severity: enhancement

Details

Reference
bz1378
ReferenceSource BranchDest BranchAuthorTitle
repos/cloud/toolforge/tools-webservice!1bryan_replace_mount_with_wsgi_filemasterraymond-ndibepython: Replace --mount with --wsgi-file
repos/phabricator/deployment!14aklapper-wmf/stable-patch-68043wmf/stableaklapperRemove "Prototype" suffix from "Reports" menu item on Project pages
repos/structured-data/image-suggestions!35T337824mainmlitnEliminate non-Commons alignment suggestions
Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
OpenNone
DeclinedNone

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:08 PM
bzimport set Reference to bz1378.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

Consider adding support for Digital Object Identifiers in wikitext. This would
be done by supporting the "doi:" URI scheme in addition to "http:", which is
already treated specially.

For example, a string "doi:10.1000/186" in wikitext would be transformed into
"<a href="http://dx.doi.org/10.1000/186">doi:10.1000/186</a>" in HTML.

A few WP articles use DOIs in references -- e.g.
[http://en.wikipedia.org/wiki/Kurtosis] -- by using standard external links.

Here is an overview of the DOI URI representation:
[http://www.doi.org/factsheets/DOIIdentifierSpecs.html]

The specification of DOI names is here:
[http://www.doi.org/handbook_2000/appendix_1.html#A1-4]. As far as specs go,
this one is a bit vague. I'm not sure I understand how to recognize the end of
a DOI. It says a DOI consists of "legal *graphic* characters of Unicode 2.0 or
greater" (emphasis added). Not sure if the graphic characters include
whitespace, or even if "graphic character" is a meaningful term defined in
Unicode. I've never seen DOIs with embedded whitespace. Also, a DOI might have
to be URL encoded when it's converted to a http: URI. I don't have any test
cases for that, sorry.

I like your idea, It is something from the ideas of Interwiki, TinyURl and
Magic-ISBN number and could be implemented. I took over to implement this,
because I am interested in this.

Just add it to the ./maintenance/interwiki.sql . This way
someone can just [[doi:10.1000/186]].

mark_sweep wrote:

No, not "[[doi:10.1000/186]]" -- it's not an internal link. Simply
"doi:10.1000/186" and also perhaps "[doi:10.1000/186 title]", just like "http:"
is treated specially. In fact, you can match on "doi:10." to detect the
beginning of a DOI.

zigger wrote:

en.wikipedia already has http://en.wikipedia.org/wiki/Template:Doi , but there
are few uses of it.

See
http://en.wikipedia.org/wiki/Wikipedia:Template_messages/Links#External_links
for other linkages using templates.

(In reply to comment #4)

en.wikipedia already has http://en.wikipedia.org/wiki/Template:Doi , but there
are few uses of it.

See
http://en.wikipedia.org/wiki/Wikipedia:Template_messages/Links#External_links
for other linkages using templates.

I want to
(In reply to comment #4)

en.wikipedia already has http://en.wikipedia.org/wiki/Template:Doi , but there
are few uses of it.

See
http://en.wikipedia.org/wiki/Wikipedia:Template_messages/Links#External_links
for other linkages using templates.

I was unclear.

I wanted to propose to have "magic "DOI <number> such we already have for ISBN
<number> , so that a magic link is created if someone write DOI 10.1000/186 in
the text. Not everyone knows yet of these identifiers, mainly scientists,
engineers and people working in research fields.

rowan.collins wrote:

If there is an existing URI scheme for these, no horrible magic like ISBN and
RFC (which break all sorts) is needed, nor even any hacking of the code - the
URI prefix can just be added to $wgUrlProtocols, as bug 431 has been fixed.
However, see bug 3133, comment 5 for a set of criteria I think need considering
before adding a new link-prefix to either the default or the Wikimedia
configuration.

[Re-assigning to Product="Wikimedia web sites" as this is now a configuration,
rather than coding, issue]

A typical DOI has this form:

doi:10.1371/journal.pone.0006872

Using a colon as a separator is the canonical way of representing DOIs. A space (or a colon followed by a space) is more rare and not supported AFAIK by any citation style, see for instance Chicago: http://bit.ly/K4INu6 or APA: http://www.loyola.edu/library/ref/apastyle.htm#journal

Also , DOI prefixes always start with a "10." string and to minimize performance issues they could be parsed only within <ref> tags.

(In reply to comment #2)

Just add it to the ./maintenance/interwiki.sql . This way
someone can just [[doi:10.1000/186]].

This has been done in the meanwhile, FYI (on Wikimedia projects).

Going through old bugs...

I'm going to go ahead and close this out -- interwikis and templates do a better job at this than URL links, as inserting direct <a href>s pointing at 'doi:something' would be useless for 99% of users.