Page MenuHomePhabricator

Visual Editor adds erroneous text "PMC" to citation template pmc= parameter
Open, MediumPublic

Description

Has something changed in a VE citation tool? Invalid PMC values are showing up.

As a WP gnome, one of the categories I take care of is Category:CS1 errors: PMC, where errors typically show up at a rate of two or three a week. All of a sudden, I have seen about fifteen errors in the last couple of days, and seven just in the last five hours. I noticed that all seven of those edits –

were made with Visual Editor, and by six different editors. This leads me to guess that a citation generation tool within Visual Editor was modified or deployed recently, and that the citation tool in question generates invalid values for pmc= within citation templates. {{para|PMC}} should contain only the PMC identifying number, not the letters "PMC".

Does someone here know how to try to track this down and, if it is a bug, fix it? Thanks.

Event Timeline

I suspect that your analysis is right: it's unlikely that six editors would make the same formatting mistake on the same day. @Mvolz will probably be able to figure out what's going on.

Also, thank you for watching this maintenance backlog.

I asked @Trappist_the_monk about this; he says that the citation templates have not changed their PMC behavior. I therefore assume that citoid is getting different information. An unexpected upstream change to the Zotero translator, maybe?

If the upstream data has changed, maybe Citoid needs to be modified to strip the "PMC" characters from the PMC value, leaving only the numbers. This is just a guess.

Note that a problem very similar to this is possibly being caused by the en.WP ReFill tool, which is another indication that upstream data may be the root cause.

Or perhaps we should change the templates to accept both forms of the data? (The downside to this solution is that these improvements to the templates would have to be copied across hundreds of wikis to solve the problem globally.)

It doesn't make sense to change any templates until the root cause of this problem is identified. What has changed recently to cause this problem?

Thanks for pointing me here @JJMC89

Note that this happens in Visual editor but not in source editor because the source editor makes no attempt to link to the PMC version.

I find it odd that the two editing systems call on different citation generation processes.

News: The problem is in citoid, and it's on the list for the devs to fix.

On @Bluerasberry's point, I assume that he's talking about RefToolbar in some of the older wikitext editors. RefToolbar is a user script written by a volunteer for the English Wikipedia in ~2010. It is enabled by default there. It is installed as a gadget at a few Wikipedias (e.g., Spanish) but not at most of them (e.g., not at German, French, or Italian [the last time I checked]). Since it's a script, any editor could install it in their own account if wanted. There have been several similar tools over the years, and AFAICT all of them use different processes for getting the information.

Mvolz triaged this task as Medium priority.

Change 338767 had a related patch set uploaded (by Mvolz):
Fix PMC prefix and failing tests

https://gerrit.wikimedia.org/r/338767

FWIW on de.wiki I can add the PMC number in VE and it seems formatted correctly?

I've done some research on this and it looks like PMC is genuinely part of the identifier.

For instance:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605911/

PMC is in the url, and on the right it says:

"PMCID: PMC3605911"

It looks like PMCID is the name of the identifier, and PMC is part of the identifier.

In cs1|2 the name of the identifier has always been PMC; the template parameter name has always been |pmc=; the value to be assigned to |pmc= has always been numeric only. https://en.wikipedia.org/wiki/Module:Citation/CS1/Identifiers creates the PMC url from the |pmc= parameter value and from the static partial URL //www.ncbi.nlm.nih.gov/pmc/articles/PMC found in https://en.wikipedia.org/wiki/Module:Citation/CS1/Configuration.

There are three parts to externally linked cs1|2 identifiers: 1) the identifier name (PMC, DOI, etc) which links to an associated Wikipedia article; 2) the static portion of the identifier link (generally the scheme, domain name, plus a bit of the path); 3) the variable portion of the URL (generally the last bit of the URL's path). cs1|2 assembles these three items into the complete identifier which it then renders in the final citation. This is true of all cs1|2 identifiers, not just |pmc=.

Change 338767 abandoned by Mvolz:
[WIP] Fix PMC prefix and failing tests

Reason:
Abandoning because declining.

https://gerrit.wikimedia.org/r/338767

Yes, I understand that's how CS1 works, but unfortunately it's wrong
according to the identifier publisher, which is the NIH.

This is from the NIH website itself:
https://publicaccess.nih.gov/include-pmcid-citations.htm

They are very clear that the name of the identifier is PMCID and that the
PMC prefix should be included, and that they should be separated by a
colon. I have googled for other sources and all the reference librarians
seems to be just be saying what the pubmed website says, i.e..

https://hsl.lib.umn.edu/biomed/help/citing-manuscript
https://www.hsl.virginia.edu/portal/researcher/pmcid.cfm

They are quite adamant about this, by the way, going so far as to say that
you will be denied NIH funding if it is not cited in this way.

We have to be careful not to overfit for the use case of CS1/ en wiki.
Since PMC is a valid part of the identifier it should be left in. If the
bots want to continue to change this it is of course up to the community to
continue to cite this identifier incorrectly :).

Boghog subscribed.

The NIH recommendations only concern how a citation is rendered, not how it is entered in a template. {{cite journal}} properly renders the citation with PMC in front of the numerical value. Requiring it to also be part of the parameter value is redundant. Furthermore none of the other citation generation tools do this.

[[cite journal | pmc = xxxxx}} renders as "PMC xxxxx"

[[cite journal | pmc = PMCxxxxx}} throws an error.

This is declined because we're not going to remove the PMC in the backend,
because this affects every language wiki and their respective templates as
well as scripts etc.

(It would be a bit like if a particular wiki left off the 10. part of DOIs
in their templates and then asked if we also leave it off in the back end
for everyone because it was "redundant". It's just bad practice to return
identifiers in an off-spec manner.)

The appropriate place to fix this in the template, by either accepting the
correct identifier (which includes PMC) or let the bot clean it out. It's
not ideal, but important that we not overfit to specfic templates.

Mvolz removed a project: Patch-For-Review.

Which Wikis have templates that require PMC in the identifier? Most Wikis currently do not have templates that support PMC and the ones that do generally take their lead from the English language CS1 templates. So it appears that this is a completely hypothetic argument.

There should be at least a localization setting that allows local Wikis to suppress PMC in the identifier.

There should be at least a localization setting that allows local Wikis to suppress PMC in the identifier.

Other wikis accept the correct identifier with no errors, i.e. uk.

TemplateData is how citoid is localised across wikis. So another option is simply not to add the pmc at all. You can do this in the template documentation. https://en.wikipedia.org/w/index.php?title=Template:Cite_journal/doc&action=edit by deleting the line

			"PMCID": "pmc",

Other Wikis handle the identifier without the redundant prefix without error. Deleting the pmc parameter from the documentation is a non sequitur. It does not make the error message go away. This would require editing the CS1 template or allowing localization of TemplateData. No one is suggesting that we remove the pmc parameter from the CS1 templates.

The line I was referring to was in the citoid 'maps' value in the td, not
to delete the PMC parameter. Sorry, that was a little ambiguous.

If you delete that line, then VE will not add the pmcid at all, which will
indeed stop the error from being displayed.

This has been "undeclined" and is resolved in T224004 - before deploying I was wondering if anyone knew of any adjustments made (.i.e. in bots) to strip out the leading PMC that returning it without the leading PMC would break? Otherwise I'll deploy it (relatively) shortly. I imagine such bots know what to do with the "correct" format anyway for user entered data.

To the best of my knowledge, there are no bots that would break if the leading PMC were removed. Citation bot (https://tools.wmflabs.org/citations/) is stripping both "pmc = PMCPMCxxxxx" and "pmc = PMCxxxxx" and returning "pmc = xxxxx". The English Wikipedia CS1 template is accepting "pmc = PMCxxxxx" but marking it as a maintenance category (Do not include "PMC" in the value https://en.wikipedia.org/wiki/Category:CS1_errors:_PMC).

Re-opening as some of them still have the PMC and this is inconsistent, i.e. 10.1186/1471-2148-7-97

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

This error still seems to be occurring in February 2024. The error can be replicated using the DOI 10.1089/aut.2020.29014.njw. This error was created in this edit.

The issue when the parameter pmc is created with value with PMC prefix occurs frequently, see https://en.wikipedia.org/wiki/User_talk:Significa_liberdade#c-Significa_liberdade-20240207195600-Significa_liberdade-20240207194500 for more details. Thank you very much! Can you please fix that?