Page MenuHomePhabricator

Tweak the citation to omit URLs to PubMed in favor of the PMID magic word
Closed, DeclinedPublic

Description

At en.wp (at least), citations to PubMed don't normally include the URL. Citoid is currently giving us this:

{{Cite journal
|url = http://www.ncbi.nlm.nih.gov/pubmed/23175147
|date = Jan 15, 2013
|access-date = 2015-04-02
|pmid = 23175147 
}}

(etc etc through the whole list of parameters). IMO it would be preferable to get only this instead:

{{Cite journal
|date = Jan 15, 2013
|pmid = 23175147 
}}

(etc etc through the whole list of parameters).

This means omitting the access date and the URL. (The |pmid parameter produces the same URL, so it's redundant.)

Event Timeline

Whatamidoing-WMF assigned this task to Mooeypoo.
Whatamidoing-WMF raised the priority of this task from to Low.
Whatamidoing-WMF updated the task description. (Show Details)
Whatamidoing-WMF added a project: Citoid.

As per the quick discussion on irc, I think we might be able to do this by perhaps redefining the map in the Cite journal properties at enwiki, but I am absolutely not sure about this. @Mvolz, ideas? Is this possible, considering it will have to happen only in the English Wikipedia? Can we define PubMed results as a different type for the map, perhaps, and have it treated differently?

Mvolz set Security to None.

There is no generalised way to do this on the front end (extension/templatedata). If users don't like having the link in the presented citation, they could modify the cite journal template to not hot-link the title when the pmid is present.

The only way to do this on the backend is simply not to include the url in the metadata. The old way we did pmids we used the doi link instead of the pmid link in the url field; in either case, I think it's desirable to know the link that the metadata came directly from, even if the same link is also present in the pmid field or doi field.

Disabling the use of a URL when a PMID is present won't be acceptable, because the URL field is sometimes needed to link to the original or a free, full-text version (e.g., on an author's website). From the perspective of the English Wikipedia, the preferred approach is that a URL to PubMed simply be omitted from the citation.

But if you think this isn't a good idea, then they can send a bot around to strip it out later.

The same behavior seems to be happening with Worldcat links and OCLC numbers.

There are some tasks open that might replace the url pointing to an open access version, if there's one available, .i.e. T174540, which might provide a partial solution.

I still think the general situation described in T190850: Do not add redundant URL to citation template when an identifier like OCLC, PMID or DOI is given requires a solution. One such solution would probably be T52407: TemplateData: Add a way to express dependencies between parameters, which is unlikely to ever be developed.

@Mvolz I'm afraid I still don't understand why it is not possible to specify a rule which prevents the issues we discuss here. As far as I can see, there is a handful of identifiers, to each of which a single URL format is attached. So a filter either on the backend or within a template should suffice. Why is this difficult to implement? (I'm not saying it's easy, I would just like to understand why it is not.)

I still think the general situation described in T190850: Do not add redundant URL to citation template when an identifier like OCLC, PMID or DOI is given requires a solution. One such solution would probably be T52407: TemplateData: Add a way to express dependencies between parameters, which is unlikely to ever be developed.

@Mvolz I'm afraid I still don't understand why it is not possible to specify a rule which prevents the issues we discuss here. As far as I can see, there is a handful of identifiers, to each of which a single URL format is attached. So a filter either on the backend or within a template should suffice. Why is this difficult to implement? (I'm not saying it's easy, I would just like to understand why it is not.)

In the backend, we would have to remove the url entirely. This would be quite bad for non- en wiki consumers of the data, particularly in other language wikis with less complex templates that may not have a field for pmid, for instance. In the worst case scenario with leaving the url in, you have redundant data, and the worst case scenario with taking the url out you have missing data, so we're not doing it in the back-end.

In the front end, the data is translated into a template via the TemplateData extension. It is not possible to do any kind of logic inside template data - it is a JSON block, not code. I'm not sure there's a non-awful way to do this. It's designed this way so that editors don't need special permissions to change things.

The best solution to redundant data being added that en wiki does not want for style reasons is to have it cleaned out with a bot - there are many such bots for formatting citations.

Thanks a lot for this explanation!

In the backend, we would have to remove the url entirely. This would be quite bad for non- en wiki consumers of the data, particularly in other language wikis with less complex templates that may not have a field for pmid, for instance. In the worst case scenario with leaving the url in, you have redundant data, and the worst case scenario with taking the url out you have missing data, so we're not doing it in the back-end.

(I don't think it's particularly relevant, but it's an issue for dewiki as well.)

What about offering two different url fields? The current url would stay the same, and there would be a second fulltext-url (name to be discussed, of course) which would only be set if there is a dedicated fulltext link, but would not be filled with the OCLC/PMID/DOI referrer URLs the complaints are about.

This way, all templates which require just an URL would work as before, but the "smart" templates could be switched to using fulltext-url.

I understand that it is not particularly elegant to maintain an URL filter within the backend and it is clear that this solution only works for at most a handful of identifiers, but from my current understanding it's still better than using a bot to clean up the mess afterwards.

I wonder whether the CS1 modules could be changed to ignore PubMed (and only PubMed) URLs when the PMID field is non-zero.

@Mvolz Did you have a chance to think about my proposal to introduce a second URL field?

For DOIs, which are arguably a more clear-cut case, I've filed T232771 and I have a patch which tries to find out whether the URL was specifically requested by the user or added unilaterally by Citoid. It's not impossible, to do so in a way that works for all wikis, just a bit tricky.

Subscribing in case someone makes a comment about a bot being made available.