Page MenuHomePhabricator

Users entering small numbers into automatic citations, expecting it to re-use an existing citation
Closed, ResolvedPublic

Description

In VisualEditor there is a very useful feature of citation with automatic citation where one can provide URL/ISBN/PMID and citation is automatically generated.

It turns out that it is quite common that new editors give "1" or "2" for the automatic citation (they real intention is probably reuse referring for "ref 1"/"ref 2") but it is understood as PMID 1 and PMID 2.

How to reproduce:

  1. In VE (such as https://en.wikipedia.org/wiki/2018_FIFA_World_Cup?veaction=edit ) click on "Cite"
  2. Enter "1" in the automatic citation

Suggested solution:
if (parseInt(ref)<10 || parseInt(ref)<numRefs) { diambigRef(); }
else citoid(ref)

Examples (hewiki, arwiki, frwiki):

  1. https://he.wikipedia.org/w/index.php?title=%D7%A4%D7%95%D7%9C_%D7%A8%D7%A1%D7%99%D7%A0%D7%99%D7%99%D7%94&diff=next&oldid=23055893 (PMID 1 on used in Paul Rassinier)
  2. https://he.wikipedia.org/w/index.php?title=בתי_המיליונרים&diff=22973555&oldid=22973415 (PMID 10)
  3. https://ar.wikipedia.org/w/index.php?title=%D8%A5%D9%85_30&oldid=28295299 (PMID 4)
  4. https://ar.wikipedia.org/wiki/%D8%B5%D9%81%D8%A7%D8%A1_%D8%A7%D9%84%D8%AC%D9%8A%D9%88%D8%B3%D9%8A
  5. https://fr.wikipedia.org/wiki/Jean-Claude_Largeau (PMID 1)

Verification Notes

  • @EAkinloose / @Ryasmeen: when verify on production, can you please verify whether typing PMID 1 within Citoid's Automatic tab generates an automatic citation like what's shown here:

Screen Shot 2022-12-08 at 10.23.53 AM.png (618×1 px, 103 KB)

Event Timeline

Esanders renamed this task from Disambigius automatic citations to Users entering small numbers into automatic citations, expecting it to re-use an existing citation.Jun 30 2018, 10:51 AM

Seems sensible. If the user enters a 1-3 digit number we should show a warning under the input, telling them to use the 're-use' reference tab instead.

More examples; enwiki has 9 articles with PMID 1:
https://en.wikipedia.org/w/index.php?search=insource%3A%2Fpmid%3D1%7D%7D%2F&title=Special:Search&profile=advanced&fulltext=1&ns0=1&searchToken=5su55mekptrwtp0jyxwrostbx

Deskana subscribed.

A good suggestion. That said, it's low priority, as it's not truly "quite common" if there's only 9 instances of PMID 1 on a wiki as large as the English Wikipedia.

A good suggestion. That said, it's low priority, as it's not truly "quite common" if there's only 9 instances of PMID 1 on a wiki as large as the English Wikipedia.

I have arrived at this Phab page by a circuitous route, starting from putting effort into trying to understand how several Biochemical journal articles have come to serve as references on a wide range of en.wiki articles. (See https://en.wikipedia.org/wiki/User:AllyD/BiochemReferences for my snapshot on 26 November 2022.) Since raising this at Village Pump I have discovered that an Edit Filter has existed since 2019, also seeking to identify such cases, but as can be seen from my summary tables, this problem is continuing to result in misleading references on articles. As I said at Village Pump (https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(miscellaneous)&diff=prev&oldid=1124311512 ), prevention at point of input is better than using audit lists to tidy up afterwards. I think the priority of this task should be revisited, and also think that the solution proposed in the Description seems suitable.

Pcoombe subscribed.

Agreed that this would be great to fix. There are currently 30 uses of PMID 1 on English Wikipedia alone, even after some efforts to clean up. And the numbers soon mount up when considering other low PMIDs.

The edit filter could help, but we know how unfriendly they can be. It would be much better to catch and correct this at the point of adding the reference.

Change 864858 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/Citoid@master] Switch to reuse panel on low numeric inputs

https://gerrit.wikimedia.org/r/864858

As a starting point, that patch makes it so that pressing “generate” after you’ve entered any number less than 1000 in the auto input will switch to the reuse panel and fill in its search field with that number. The user would then be responsible for clicking the item in the search results that corresponds to the one they meant.

Enhancements to this that'd take slightly more work to implement would be:

  1. pressing generate instantly inserting the new citation without requiring confirmation
  2. making this behavior trigger when you enter any valid citation reference in the auto input (i.e. named refs would be caught as well, and numbers that aren't references to existing citations would get passed through to the regular citoid behavior)

Change 864858 merged by jenkins-bot:

[mediawiki/extensions/Citoid@master] Switch to reuse panel on low numeric inputs

https://gerrit.wikimedia.org/r/864858

For QA it'd be good to double-check that this doesn't interfere with explicitly triggering a PMID citation via entering "PMID 1" into the citoid field.

As a starting point, that patch makes it so that pressing “generate” after you’ve entered any number less than 1000 in the auto input will switch to the reuse panel and fill in its search field with that number. The user would then be responsible for clicking the item in the search results that corresponds to the one they meant.

Enhancements to this that'd take slightly more work to implement would be:

  1. pressing generate instantly inserting the new citation without requiring confirmation
  2. making this behavior trigger when you enter any valid citation reference in the auto input (i.e. named refs would be caught as well, and numbers that aren't references to existing citations would get passed through to the regular citoid behavior)

Yeah, that's T126488.

I've checked the patch and it definitely works - it's just that a single digit isn't a great search term in general but at least it'll make them think "huh, what did I do" maybe.

There's also the issue that now we have to say PMIDs over 1000 and not all PMIDS. Though that's partly on PMID from having a numbering system starting from 1.

There's also the fact it switches to re-use even if there isn't that many citations. If there were only 3, we could just actually insert the third one. But then you have a slippery slope where we have to generate a new citation and then place it in a search results of existing citations and rank them somehow.

I guess the next lowest hanging fruit would be to limit it to the number of citations at least actually in the page, rather than a hard number, or possibly present both options- but that becomes more complicated.

For QA it'd be good to double-check that this doesn't interfere with explicitly triggering a PMID citation via entering "PMID 1" into the citoid field.

I mean, it does interfere, you can no longer enter pmids under 1000 (generally papers before 1975). I'm not a huge fan of this approach consequently.

I mean, it does interfere, you can no longer enter pmids under 1000 (generally papers before 1975). I'm not a huge fan of this approach consequently.

Sorry, I was unclear -- I meant literally entering "PMID 1" into the field, which should result in it still generating the citation for PMID 1. Just entering "1" would no longer do it, but it should still be possible to make the citation if you know that trick. (But not "pmid 1". It's very specific.)

Screen Recording 2022-12-11 at 1.35.42 PM.gif (688×1 px, 1 MB)

(I've never tried sticking a gif into here before, so I'm curious how well that'll work.)

I mean, it does interfere, you can no longer enter pmids under 1000 (generally papers before 1975). I'm not a huge fan of this approach consequently.

Sorry, I was unclear -- I meant literally entering "PMID 1" into the field, which should result in it still generating the citation for PMID 1. Just entering "1" would no longer do it, but it should still be possible to make the citation if you know that trick. (But not "pmid 1". It's very specific.)

Screen Recording 2022-12-11 at 1.35.42 PM.gif (688×1 px, 1 MB)

(I've never tried sticking a gif into here before, so I'm curious how well that'll work.)

Yeah that doesn't work.

"PMID 1" searches worldcat and crossref for the string, and it doesn't get it right. Only the bare integer actually gets interpreted as a pmid.

It's doable with a bit of extra regex in the citoid service.

Ah, interesting -- I'd tried it for a few and it seemed to work consistently, but I guess that was more happenstance than anything?

Entering numbers switches to Re-use as expected. Tested on https://en.wikipedia.org/wiki/User:EAkinloose_(WMF)/sandbox

Screenshot 2023-01-06 at 17.54.55.png (1×1 px, 244 KB)

Screenshot 2023-01-06 at 17.55.23.png (922×1 px, 211 KB)

PMID 1 :

Screenshot 2023-01-06 at 17.53.56.png (720×1 px, 244 KB)

PMID 2 :

Screenshot 2023-01-06 at 17.55.54.png (1×1 px, 245 KB)

PMID 1000 :

Screenshot 2023-01-06 at 18.47.10.png (872×1 px, 180 KB)