Page MenuHomePhabricator

Remove doi-access=free when Unpaywall no longer confirms it
Open, MediumPublic

Assigned To
Authored By
Nemo_bis
Aug 13 2023, 7:41 AM
Referenced Files
Restricted File
Dec 8 2023, 2:28 PM
File Not Attached
Restricted File
Dec 8 2023, 2:28 PM
File Not Attached
F41575560: Screenshot_20231208_134236.png
Dec 8 2023, 2:28 PM
Subscribers

Description

We're currently adding doi-access=free even when bronze OA (non-libre but gratis OA) copies are available from the publisher. Such copies tend to disappear over time for various reasons, and when they do the template parameter becomes misleading. While ideally we'd use a BOAI definition of "OA", which would minimise this problem, that's not an option with the current citation templates.

We need to be able to remove the doi-access=free when Unpaywall thinks a work is no longer OA (or when it gives no publisher-provided OA URL).

Event Timeline

https://en.wikipedia.org/w/index.php?title=Agriculture_in_Taiwan&diff=1170568092&oldid=1170224661 is a Wiley article which was marked OA back in 2020 and is now considered closed by Unpaywall, though it seems to be bronze OA once you get past the nasty ereader.

Now we're correctly using Unpaywall data but still the detection of bronze papers seems to have deteriorated somewhat. Problematic DOIs include the bogus epaper ones but also news/obituaries/editorials/random content types which don't have any paywalled content but may also not offer a gratis full text PDF.

Some problematic DOIs:
https://doi.org/10.1074/jbc.274.9.5339
https://doi.org/10.1242/jcs.02646
https://doi.org/10.1093/pasj/63.5.1117
https://doi.org/10.1002/ajpa.1330360603
https://doi.org/10.1074/jbc.M305191200
https://doi.org/10.1056/NEJMoa061355
https://doi.org/10.1098/rsbm.1980.0002
https://doi.org/10.1029/2010JE003647

10.1046/j.1471-4159.1999.0730012.x is an example of paper with a deceitful "free access" label but which can hardly qualify even as gratis/bronze OA given download and printing aren't permitted. Hence the removal was appropriate.

There are a few false negatives for AAS DOIs available from ADS or elsewhere, like 10.1086/116239, 10.1086/301260, 10.1086/342942, 10.1088/0067-0049/180/1/117 .

Also a few for RAS and other DOIs at OUP like 10.1046/j.1365-8711.1999.02605.x, 10.1143/PTP.10.581, 10.1093/mnras/168.1.235, 10.1111/j.1365-2966.2010.16827.x .

The IOP PDFs are walled behind ShieldSquare captcha while OUP PDFs are walled behind Silverchair.

More examples:

10.1016/j.febslet.2006.10.030 (FEBS letters)
10.1056/NEJMoa1306227 (NEJM)
10.1076/phbi.36.4.237.4583 (Wiley)
10.1111/nph.12902 (Wiley)
10.11646/phytotaxa.369.2.6
10.1484/J.RM.5.101158
10.2164/jandrol.108.007377 (Wiley)

Dubious cases:
10.1016/j.ygeno.2008.07.004 (Elsevier hcaptchawalled)

Correctly removed:
10.1038/sj.onc.1201604 (Nature)
10.1111/j.1464-410X.2011.10487.x (Wiley)
10.7326/0003-4819-139-3-200308050-00017 (ACP)

Scitrus/Atypon "ereader" is often involved.

34 more examples which seem bronze OA from my manual check, out of 71 OAbot found Unpaywall says are closed (the rest I mostly couldn't verify).

10.1002/2014JB011430
10.1016/j.virol.2003.08.001
10.1016/j.visres.2007.01.019
10.1021/sb4001382
10.1038/mi.2010.47
10.1046/j.1365-2958.1998.00774.x
10.1046/j.1420-9101.2002.00419.x
10.1046/j.1469-8137.2001.00034.x
10.1074/jbc.M100354200
10.1074/jbc.R400029200
10.1080/073911011010524992
10.1093/brain/124.5.893
10.1093/dnares/8.6.319
10.1093/pasj/63.5.1117
10.1094/MPMI.2004.17.3.292
10.1095/biolreprod.106.057810
10.1098/rsbm.1980.0005
10.1098/rstl.1766.0019
10.1111/1744-7917.12059
10.1111/j.1365-2656.2006.01172.x
10.1111/j.1475-4983.2009.00898.x
10.1111/j.1574-6976.2010.00250.x
10.1111/j.1600-0536.1992.tb05211.x
10.1111/maps.12395
10.1113/expphysiol.2007.038695
10.1126/science.318.5858.1842
10.1143/PTP.10.581
10.1146/knowable-072221-1
10.1167/iovs.08-2483
10.1242/dev.00853
10.1242/jcs.00229
10.1242/jeb.01220
10.1242/jeb.02211
10.1287/opre.18.6.1225

.List_of_topics_characterized_as_pseudoscience
.................Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/data/project/oabot/www/python/src/app.py", line 218, in get_proposed_edits
    filtered = list([e for e in all_templates if e.proposed_change])
  File "/data/project/oabot/www/python/src/app.py", line 218, in <listcomp>
    filtered = list([e for e in all_templates if e.proposed_change])
  File "/data/project/oabot/www/python/src/oabot/main.py", line 387, in add_oa_links_in_references
    edit.propose_change(only_doi)
  File "/data/project/oabot/www/python/src/oabot/main.py", line 118, in propose_change
    link, oa_status = get_oa_link(paper=dissemin_paper_object, doi=doi, only_unpaywall=only_doi)
  File "/data/project/oabot/www/python/src/oabot/main.py", line 368, in get_oa_link
    return url, resp['oa_status']
UnboundLocalError: local variable 'resp' referenced before assignment
........Equity_premium_puzzle
......Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/data/project/oabot/www/python/src/app.py", line 218, in get_proposed_edits
    filtered = list([e for e in all_templates if e.proposed_change])
  File "/data/project/oabot/www/python/src/app.py", line 218, in <listcomp>
    filtered = list([e for e in all_templates if e.proposed_change])
  File "/data/project/oabot/www/python/src/oabot/main.py", line 387, in add_oa_links_in_references
    edit.propose_change(only_doi)
  File "/data/project/oabot/www/python/src/oabot/main.py", line 118, in propose_change
    link, oa_status = get_oa_link(paper=dissemin_paper_object, doi=doi, only_unpaywall=only_doi)
  File "/data/project/oabot/www/python/src/oabot/main.py", line 337, in get_oa_link
    if 'citeseerx.ist.psu.edu' in resp['best_oa_location']['url_for_landing_page']:
TypeError: argument of type 'NoneType' is not iterable

There are currently about 20k edits in the queue which would remove a doi-access=true parameter (but currently are doing nothing).

A sample of 20 DOIs from those suggestions found about half which seemed to be sort of bronze OA/gratis OA, although seemingly all Atypon-walled. These were:

10.1074/jbc.M705794200
10.1257/aer.101.2.1012
10.1089/154732804323099163
10.1046/j.1365-2141.2001.03187.x
10.1074/jbc.271.6.3255
10.1002/ijc.11459
10.1029/2000GL012679
10.1029/1999JE001177
10.1098/rsta.1909.0016
10.1016/j.ydbio.2007.03.521

The sample was selected with a highly scientific:

tools.oabot@tools-sgebastion-10:~/www/python/src/cache$ grep -hr '"proposed_change": "doi-access=|"' ../bot_cache | jq '.proposed_edits' | grep conflicting_value | cut -d\" -f4 | sort -u | shuf -n 20

and then the DOI visited manually. Unpaywall data for most of these has been updated in the last 12 months.

Most common prefixes of DOIs which would be removed:

3877     10.1016
2698     10.1038
2370     10.1111
2200     10.1093
1673     10.1074
1419     10.1001
1239     10.1098
1132     10.1086
1112     10.1002
 609     10.1080
 513     10.1242
 488     10.1126
 471     10.1158
 470     10.1007
 454     10.1046
 375     10.1029
 341     10.1146
 298     10.1161
 257     10.1094
 244     10.11646
 235     10.1177
 221     10.1542
 188     10.1525
 175     10.7326
 172     10.2307
 172     10.1021
 170     10.1167
 165     10.1107
 155     10.1017

I made reports upstream for Journal of Biological Chemistry (already fixed), Journal of Asian Studies/Duke University Press, Annual Review of Public Health, AAS journals, AME journals. I manually removed their doi-access=free removals in the queue (they were around 10 % of the total, I think, including all 10.1146/annurev DOIs some of which are not open yet).

Some of the most common DOI segments slated for doi-access=free removal in today's run:

$ find ~/www/python/src/bot_cache -type f -exec jq '.proposed_edits | .[] | .orig_string' {} \; | grep doi= | grep -Eo 'doi *=[^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40
    392 10.1126/science.
    351 10.1074/jbc.
    260 10.1001/jama.
    236 10.1038/sj.onc
    209 10.1007/s
    176 10.1016/s
    173 10.1038/s
    155 10.1098/rsbm.
    147 10.1146/knowable
    139 10.1038/d
    116 10.1098/rstb.
    111 10.1525/aa.
    110 10.1098/rspa.
    104 10.1242/jeb.
    104 10.1111/j.
    100 10.5210/fm.
    100 10.1377/hlthaff.
     99 10.1016/j.
     91 10.1098/rstl.
     86 10.1093/mnras
     76 10.1242/jcs.
     75 10.1167/iovs.
     68 10.1001/archinte.
     62 10.1542/peds.
     61 10.1111/j.1469-8137
     60 10.1098/rsta.
     57 10.1111/j.1558-5646
     55 10.1001/archneur.
     53 10.1111/j.1096-3642
     52 10.1001/archpsyc.
     48 10.3732/ajb.
     46 10.1038/nature
     46 10.1002/art.
     45 10.1038/sj.mp
     43 10.1016/j.febslet
     42 10.1111/j.1432-1033
     42 10.1093/hmg
     41 10.1016/j.jacc
     41 10.1007/bf
     40 10.1093/acrefore

Clearly the upstream changes like JBC have not yet fully propagated.

Or to catch some more ISSN:

$ find ~/www/python/src/bot_cache -type f -exec jq '.proposed_edits | .[] | .orig_string' {} \; | grep doi= | grep -Eo 'doi *=[^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+\b(\.?([a-z]{,8}|[0-9-]{8,9})\b)?' | sort | uniq -c | sort -nr | head -n 4
    390 10.1126/science.
    260 10.1001/jama.
    244 10.1074/jbc.
    235 10.1038/sj.onc
    155 10.1098/rsbm.
    116 10.1098/rstb.
    111 10.1525/aa.
    110 10.1098/rspa.
    104 10.1242/jeb.
    104 10.1111/j.
    100 10.5210/fm.
    100 10.1377/hlthaff.
     99 10.1016/j.
     91 10.1098/rstl.
     86 10.1093/mnras
     74 10.1242/jcs.
     68 10.1167/iovs.
     68 10.1001/archinte.
     62 10.1542/peds.
     61 10.1111/j.1469-8137
     60 10.1098/rsta.
     57 10.1111/j.1558-5646
     55 10.1001/archneur.
     53 10.1111/j.1096-3642
     52 10.1001/archpsyc.
     48 10.3732/ajb.
     46 10.1002/art.
     43 10.1038/sj.mp
     43 10.1016/j.febslet
     42 10.1093/hmg
     41 10.1111/j.1432-1033
     41 10.1016/j.jacc
     40 10.1093/acrefore
     40 10.1001/archopht.
     39 10.1098/rspb.
     39 10.1093/molbev
     38 10.1001/archpedi.
     37 10.1242/dev.
     37 10.1111/j.1475-4983
     36 10.1016/j.jasms

Filed Knowable Magazine upstream at https://github.com/ourresearch/oadoi/pull/142

Some ISSNs

$ find ~/www/python/src/bot_cache -type f -exec jq '.proposed_edits | .[] | .orig_string' {} \; | grep issn | grep -Eo 'issn *= *[0-9-]{8,9}' | grep -Eo '[0-9-]{8,9}' | sort | uniq -c | sort -nr | head -n 40
     87 0036-8075
     46 0004-637
     45 1476-4687
     45 0004-6256
     39 0191-2917
     39 0098-7484
     33 0028-0836
     28 1044-0305
     25 0067-0049
     24 0080-4606
     24 0021-8693
     19 2156-2202
     19 1396-0466
     18 1538-4365
     17 0148-0227
     17 0031-4005
     17 0022-0949
     16 0950-9232
     16 0304-3975
     16 0278-2715
     16 0140-6736
     16 0035-8711
     16 0028-646
     16 0002-7294
     15 1944-8007
     15 1538-4357
     15 0301-4223
     15 0031-949
     15 0006-3568
     15 0003-9926
     14 2330-4804
     14 1475-4983
     14 0271-5333
     13 0272-4634
     13 0097-3165
     13 0080-4630
     12 2515-5172
     12 1631-0683
     12 1364-5021
     12 0094-8276

I've manually deleted the older suggestions so now the numbers will be lower.

find ~/www/python/src/bot_cache -mtime +3 -delete

How to sample JSTOR DOIs which look closed:

$ find -maxdepth 1 -type f -print0 | xargs -0 -P8 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=|")) | .orig_string' | grep 2307 | grep -Eo "10.2307/[0-9]+" | sort | shuf -n 40

Some doi-access=free being re-added now:

$ find -maxdepth 1 -type f -print0 | xargs -0 -P16 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=free")) | .orig_string' | grep doi | grep -Eo 'doi *= *[^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40                                                                                                                                
    546 10.1146/annurev
    409 10.1007/s
    186 10.4202/app.
    178 10.1016/j.
    176 10.1016/j.cub
    156 10.1126/science.
    124 10.1038/s
     96 10.1016/j.cretres
     84 10.1111/pala.
     78 10.1017/jpa.
     72 10.1074/jbc.
     66 10.1002/ar.
     61 10.5252/geodiversitas
     56 10.11646/zootaxa.
     52 10.5852/ejt.
     52 10.5852/cr
     52 10.1016/j.palaeo
     52 10.1002/spp
     48 10.1016/j.jhevol
     46 10.1093/zoolinnean
     44 10.5962/bhl.part
     44 10.1111/j.
     42 10.1016/s
     41 10.3140/bull.geosci
     39 10.1016/j.cell
     39 10.1002/ajb
     38 10.4049/jimmunol.
     38 10.1017/pab.
     33 10.1038/nature
     32 10.1111/j.1475-4983
     31 10.37828/em.
     31 10.1093/mnras
     28 10.1111/j.1096-3642
     27 10.5962/p.
     27 10.2476/asjaa.
     25 10.7203/sjp.
     25 10.1016/j.revpalbo
     23 10.1002/ajpa.
     21 10.24425/agp.
     21 10.1093/bioinformatics

Still room for improvement

$ find -maxdepth 1 -type f -print0 | xargs -0 -P16 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=|")) | .orig_string' | grep doi | grep -Eo 'doi *= [^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40

   1601 10.1074/jbc.
    902 10.1038/sj.onc
    375 10.1093/pasj
    274 10.1098/rsbm.
    256 10.1001/jama.
    161 10.1242/jcs.
    153 10.1126/science.
    135 10.11646/zootaxa.
    109 10.1038/sj.mp
     93 10.1093/hmg
     83 10.1001/archinte.
     82 10.1016/j.febslet
     79 10.1167/iovs.
     79 10.1111/j.1432-1033
     78 10.1016/s
     74 10.1002/art.
     71 10.1038/sj.leu
     64 10.1001/archneur.
     60 10.1016/j.jacc
     59 10.1242/dev.
     59 10.1182/blood
     49 10.1111/j.
     45 10.1542/peds.
     45 10.1146/annurev
     43 10.1098/rstb.
     43 10.1001/archpsyc.
     40 10.1189/jlb.
     38 10.1093/molbev
     38 10.1002/ijc.
     35 10.1111/j.1471-4159
     35 10.1098/rspa.
     33 10.1046/j.1471-4159
     30 10.1525/aa.
     30 10.1093/bioinformatics
     30 10.1038/onc.
     29 10.1242/jeb.
     29 10.1046/j.
     28 10.1038/d
     28 10.1016/j.bbamcr
     27 10.1377/hlthaff.

10.1074/jbc.

Mostly fixed upstream.

10.1038/sj.onc

Oncogene is indeed closed.

10.1093/pasj

These are all about https://doi.org/10.1093/pasj/63.5.1117, also archived at https://fatcat.wiki/release/ldocvpxxwfebhe24iblyszbfcu; journal is half OA https://fatcat.wiki/container/h7ngi63dovculk3m77uvbzf56y/coverage . Reported to Unpaywall as ticket #28737.

10.1098/rsbm.

False negatives for Biographical Memoirs of Fellows of the Royal Society https://fatcat.wiki/container/j37tr2b3jncfzc6ac77b3iagoq

PR: https://github.com/ourresearch/oadoi/pull/143/commits/3017a106e628c1e361c47fe80773777dfbb2b19c

(to be continued)

After the latest run

$ find -maxdepth 1 -type f -print0 | xargs -0 -P16 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=|")) | .orig_string' | grep doi | grep -Eo 'doi *= [^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40
parse error: Invalid numeric literal at line 1, column 6
   1595 10.1074/jbc.
    846 10.1038/sj.onc
    337 10.1093/pasj
    254 10.1098/rsbm.
    245 10.1001/jama.
    147 10.1126/science.
    139 10.1242/jcs.
    114 10.11646/zootaxa.
     95 10.1038/sj.mp
     85 10.1002/art.
     82 10.1093/hmg
     74 10.1001/archinte.
     72 10.1016/j.febslet
     68 10.1167/iovs.
     67 10.1016/s
     65 10.1111/j.1432-1033
     62 10.1038/sj.leu
     60 10.1016/j.jacc
     58 10.1242/dev.
     56 10.1001/archneur.
     53 10.1182/blood
     51 10.1111/j.
     45 10.1189/jlb.
     44 10.1542/peds.
     41 10.1001/archpsyc.
     40 10.1111/j.1471-4159
     38 10.1098/rstb.
     38 10.1046/j.
     35 10.1525/aa.
     35 10.1002/ijc.
     33 10.1093/molbev
     33 10.1046/j.1471-4159
     32 10.1098/rspa.
     31 10.1146/annurev
     29 10.1093/bioinformatics
     28 10.1038/onc.
     27 10.1038/d
     26 10.1016/j.chembiol
     25 10.1242/jeb.
     25 10.1098/rstl

34 more examples which seem bronze OA from my manual check, out of 71 OAbot found Unpaywall says are closed (the rest I mostly couldn't verify).

Of these, 24 are still considered closed by Unpaywall, of which 8 explicitly require login. So the error rate is down to 16/71, and these are mostly captcha-walled PDFs at publishers like T&F and Wiley.

Current most popular DOI prefixes

$ find ~/www/python/src/bot_cache -maxdepth 1 -type f -print0 | xargs -0 -P16 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=|")) | .orig_string' | grep doi | grep -Eo 'doi *= [^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40
jq: error: Could not open file /data/project/oabot/www/python/src/bot_cache/ISO#IEC_2022.json: No such file or directory
parse error: Invalid numeric literal at line 1, column 6
   1933 10.1074/jbc.
   1194 10.1038/sj.onc
    705 10.1126/science.
    512 10.1098/rsbm.
    396 10.4049/jimmunol.
    385 10.1093/hmg
    370 10.1111/syen.
    304 10.1096/fj.
    284 10.1001/jama.
    250 10.1242/jcs.
    213 10.1096/fasebj.
    204 10.11646/zootaxa.
    202 10.1182/blood
    162 10.1016/j.febslet
    138 10.1038/sj.mp
    127 10.1182/blood.
    111 10.1242/dev.
    103 10.1016/s
    100 10.1111/j.
    100 10.1002/art.
     87 10.1210/jcem.
     87 10.1167/iovs.
     85 10.1111/j.1432-1033
     81 10.1093/brain
     81 10.1016/j.
     80 10.1038/onc.
     80 10.1001/archinte.
     77 10.1093/humupd
     76 10.1038/sj.leu
     75 10.1242/jeb.
     75 10.1098/rstl.
     74 10.1093/mnras
     74 10.1002/ijc.
     73 10.1001/archneur.
     72 10.1007/s
     70 10.4269/ajtmh.
     70 10.1146/annurev
     66 10.1016/j.cell
     64 10.1542/peds.
     62 10.1124/pr.
Nemo_bis lowered the priority of this task from High to Medium.May 22 2025, 8:37 PM

In my own editing session I found 24 correct suggestions to remove doi-access=free, and I rejected over 50:

tools.oabot@tools-bastion-12:~/www/python/src/cache$ ack -H '"classification": "rejected"' | sed 's,{"orig_string",\n{"orig_string",g' | grep 'doi-access=|' | grep -Eo '"conflicting_value": "[^"]+' | sed 's,"conflicting_value": ",,g' | 
sort -u                                                                                                                
10.1002/adma.200703183
10.1002/j.2050-0416.2011.tb00447.x
10.1002/tax.591031
10.1016/0028-2243(88)90130-x
10.1016/j.crhy.2017.11.003
10.1021/jp5097376
10.1029/2009JD013493
10.1038/26886
10.1090/S0002-9947-1974-0380158-2
10.1090/S0002-9947-1976-0390550-X
10.1090/S0002-9947-1986-0825722-7
10.1090/S0894-0347-1990-1065053-0
10.1093/mnras/282.1.40
10.1097/01.wno.0000189065.20552.88
10.1098/rsbm.1976.0005
10.1099/00207713-50-6-2083
10.1099/ijs.0.000280
10.1099/ijs.0.000383
10.1099/ijs.0.015842-0
10.1099/ijs.0.02094-0
10.1099/ijs.0.02624-0
10.1099/ijs.0.02661-0
10.1099/ijs.0.059626-0
10.1099/ijs.0.63867-0
10.1099/ijs.0.64043-0
10.1099/ijsem.0.000414
10.1099/ijsem.0.002559
10.1099/mic.0.28743-0
10.1103/PhysRevLett.23.930
10.1111/cla.12154
10.1111/syen.12358
10.1126/science.aao1498
10.1145/3491239
10.1145/359156.359164
10.1146/annurev-aa-58-081420-100001
10.1146/annurev-anchem-071213-020227
10.1146/knowable-080620-1
10.11646/zootaxa.3754.4.1
10.1175/1520-0485(1977)007<0952:imituo>2.0.co;2
10.1177/0020294015600474
10.1214/ss/1177013002
10.1258/002221505774783377
10.1258/002367706777611488
10.1644/1545-1542(2005)86[495:vimdam]2.0.co;2
10.1680/geot.1976.26.3.393
10.1680/geot.1980.30.3.227
10.1680/geot.1983.33.3.187
10.1680/imotp.1874.22770
10.17723/aarc.40.4.q7371u7450r1w237
10.2307/1988605
10.3138/cbmh.23.2.562
10.4095/133497
10.4095/133498
10.54102/ajt.8gztc
10.5479/si.00810282.120

Some of the most common DOI prefixes now:

$ find ~/www/python/src/bot_cache -maxdepth 1 -type f -print0 | xargs -0 -P16 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=|")) | .orig_string' | grep doi | grep -Eo 'doi *= [^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40 
    513 10.1038/sj.onc
    341 10.1126/science.
    331 10.1098/rsbm.
    266 10.4049/jimmunol.
    252 10.1099/ijs.
    183 10.1111/syen.
    156 10.1242/jcs.
    137 10.1096/fj.
    136 10.1093/hmg
    118 10.1002/humu.
    112 10.1001/jama.
     98 10.1096/fasebj.
     91 10.11646/zootaxa.
     88 10.1099/ijsem.
     86 10.1182/blood
     75 10.1016/j.febslet
     66 10.1099/mic.
     52 10.1242/dev.
     50 10.1111/j.1432-1033
     49 10.1038/sj.mp
     47 10.1002/art.
     46 10.1016/j.
     45 10.1038/sj.leu
     43 10.1016/s
     42 10.1099/vir.
     41 10.1007/s
     39 10.1001/archneur.
     38 10.1182/blood.
     38 10.1111/j.
     35 10.1093/brain
     35 10.1001/archinte.
     34 10.1210/endo.
     33 10.1525/aa.
     33 10.1210/mend.
     32 10.1038/onc.
     29 10.1167/iovs.
     29 10.1093/mnras
     29 10.1093/humupd
     29 10.1002/ijc.
     28 10.1111/j.1471-4159

In the case of doi:10.11646/phytotaxa.498.3.2 it would have helped to look up the Internet Archive Scholar, because https://scholar.archive.org/fatcat/release/q35yyfg2cfg7vd3awxexwrdyui has it open, so we could have avoided treating it as closed.

There will be another major Unpaywall update in December, I'll have to test the suggestions again.