Page MenuHomePhabricator

Nemo_bis (Nemo)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 10 2014, 2:32 PM (496 w, 17 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
Nemo bis [ Global Accounts ]

Wikimedia cross-wiki coordination and L10n/i18n. Mainly active on Wikiquote, Wiktionary, Wikisource, Commons, Wikidata, Wikibooks. And of course Meta-Wiki, translatewiki.net.

Contact me by MediaWiki.org email or user talk.

Recent Activity

Jan 16 2024

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

After the latest run

Jan 16 2024, 8:03 AM · OABot

Jan 14 2024

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Mostly fixed upstream.

Jan 14 2024, 1:29 PM · OABot

Jan 13 2024

Nemo_bis added a comment to T283717: Add PMC ID even if doi-access=free.

Not clear to me why this doi:10.1038/s41586-023-06291-2 got an arxiv but not pmc ID https://en.wikipedia.org/w/index.php?title=PubMed&diff=prev&oldid=1195324840

Jan 13 2024, 11:22 AM · OABot
Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

The new round seems to go fine so far https://en.wikipedia.org/w/index.php?title=Special:Contributions/OAbot&target=OAbot&dir=prev&offset=20240107000000&limit=50

Jan 13 2024, 11:17 AM · OABot

Jan 7 2024

Nemo_bis triaged T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers as Medium priority.
Jan 7 2024, 9:26 PM · OABot
Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

For the non-Unpaywall side, continues at T228702

Jan 7 2024, 9:19 PM · OABot
Nemo_bis added a comment to T228702: Relax author and year match?.

We're still discarding excess merges from Dissemin similar to the 2019 logic https://github.com/dissemin/oabot/commit/e3c74bff735c1ef16ee333dde2ac4bdd20949635 . We're not currently using the Dissemin title matches but if we did it would not be enough to check for title, author, year match: https://en.wikipedia.org/w/index.php?title=User_talk%3AOAbot&diff=1194216712&oldid=1193993325 .

Jan 7 2024, 9:18 PM · OABot
Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

Should be fixed with https://github.com/dissemin/oabot/pull/91/commits/1b49d999504b868c7d5eb4d4512300db1f55e871

Jan 7 2024, 9:07 PM · OABot
Nemo_bis updated the task description for T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.
Jan 7 2024, 8:54 PM · OABot
Nemo_bis updated the task description for T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.
Jan 7 2024, 8:52 PM · OABot
Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

There are over 6500k PMC matches and only 650k matches by title and author, of which some 60k appear without a PMCID match, so perhaps we can just ignore those europepmc matches:

$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep '"is_oa": true' | grep pmc | grep -c "oa repository (via pmcid lookup)"
6499014
$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep '"is_oa": true' | grep pmc | grep -c "oa repository (via OAI-PMH title and first author match)"
637491
$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep '"is_oa": true' | grep pmc | grep "oa repository (via OAI-PMH title and first author match)" | grep -vc "oa repository (via pmcid lookup)"
62310
Jan 7 2024, 7:47 PM · OABot
Nemo_bis added a comment to T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.

Both papers on Unpaywall have evidence "oa repository (via OAI-PMH title and first author match)" although the PMC side exposes a link to the correct DOI. The CrossRef API has the page range like "113-128", "283-288", so it may be possible to check for the number of pages.

Jan 7 2024, 6:36 PM · OABot
Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

So we won't suggest edits like this either https://en.wikipedia.org/w/index.php?title=Saccharomyceta&curid=68064105&diff=1194087545&oldid=1182890284 as we don't get non-repository URLs from other sources.

Jan 7 2024, 9:37 AM · OABot

Jan 6 2024

Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

A sample of what kind of URLs we're talking about

Jan 6 2024, 5:34 PM · OABot
Nemo_bis raised the priority of T354471: Re-assess repository links Unpaywall found on CiteSeerX from Low to Medium.
Jan 6 2024, 2:48 PM · OABot
Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

Should be fixed by https://github.com/nemobis/oabot/commit/8895319d9fd65808b8a1cb41dd0ef29ed2987c43

Jan 6 2024, 2:47 PM · OABot
Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

Only 35k or so of these are in the best_oa_location (sometimes even when a separate match for arxiv exists, like doi:10.1002/rsa.20071 / oai:CiteSeerX.psu:10.1.1.237.8456 / oai:arXiv.org:math/0209357 ).

Jan 6 2024, 2:26 PM · OABot
Nemo_bis added a comment to T354471: Re-assess repository links Unpaywall found on CiteSeerX.

Not sure how to narrow this down, we're talking about some 500k matches from CiteSeerX (out of 900k):

$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep citeseerx | grep "oa repository (via OAI-PMH doi match)" | jq -r 'select(.oa_locations | .[] | .endpoint_id == "CiteSeerX.psu" and .evidence == "oa repository (via OAI-PMH doi match)" )|.doi' | wc -l
505747
$ lbzip2 -dc unpaywall_snapshot_2022-03-09_sorted.jsonl.bz2 | grep -c citeseerx
887759
Jan 6 2024, 1:59 PM · OABot
Nemo_bis added a parent task for T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers: T283717: Add PMC ID even if doi-access=free.
Jan 6 2024, 10:23 AM · OABot
Nemo_bis added a subtask for T283717: Add PMC ID even if doi-access=free: T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.
Jan 6 2024, 10:23 AM · OABot
Nemo_bis created T354472: Work around incorrect matches for PMC IDs of AMS/PNAS papers.
Jan 6 2024, 10:08 AM · OABot
Nemo_bis added a parent task for T354471: Re-assess repository links Unpaywall found on CiteSeerX: T283717: Add PMC ID even if doi-access=free.
Jan 6 2024, 10:01 AM · OABot
Nemo_bis added a subtask for T283717: Add PMC ID even if doi-access=free: T354471: Re-assess repository links Unpaywall found on CiteSeerX.
Jan 6 2024, 10:01 AM · OABot
Nemo_bis updated the task description for T354471: Re-assess repository links Unpaywall found on CiteSeerX.
Jan 6 2024, 10:01 AM · OABot
Nemo_bis claimed T354471: Re-assess repository links Unpaywall found on CiteSeerX.
Jan 6 2024, 10:00 AM · OABot
Nemo_bis created T354471: Re-assess repository links Unpaywall found on CiteSeerX.
Jan 6 2024, 9:59 AM · OABot

Jan 5 2024

Nemo_bis added a comment to T283717: Add PMC ID even if doi-access=free.

Another example where URL priorities changed: https://en.wikipedia.org/w/index.php?title=Balbinot_1&diff=prev&oldid=1193722831 (but there was no doi-access=free).

Jan 5 2024, 8:24 AM · OABot
Nemo_bis added a comment to T283717: Add PMC ID even if doi-access=free.

The recent change to sort all URLs https://github.com/dissemin/oabot/commit/ddab25a5ee71e2f23fe4b8dfb5a28c8da333a922 allowed the bot to perform https://en.wikipedia.org/w/index.php?title=Serafim_Kalliadasis&diff=prev&oldid=1193717235 , while previously it would probably only have suggested the first URL https://eprints.qut.edu.au/134215/1/134215p.pdf . http://hdl.handle.net/10044/1/55290 is the 3rd suggestion from Unpaywall and https://arxiv.org/abs/1609.05938 is the 8th.

Jan 5 2024, 7:39 AM · OABot
Nemo_bis claimed T283717: Add PMC ID even if doi-access=free.
Jan 5 2024, 7:36 AM · OABot

Jan 1 2024

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

That's fixed in https://github.com/dissemin/oabot/commit/1cd61525a8cc5d8378e60f63555cf291e1bb4660 hopefully

Jan 1 2024, 5:58 PM · OABot
Nemo_bis closed T354144: OAbot leaderboard not being updated with new users as Resolved.
Jan 1 2024, 2:59 PM · OABot
Nemo_bis added a comment to T354144: OAbot leaderboard not being updated with new users.

I've manually updated the leaderboard with https://github.com/nemobis/oabot/commit/4917289ac7b49ca5176129d9f19ae5355ac84b72

Jan 1 2024, 2:58 PM · OABot
Nemo_bis added a comment to T354144: OAbot leaderboard not being updated with new users.

The last row created was

Jan 1 2024, 1:35 PM · OABot
Nemo_bis created T354144: OAbot leaderboard not being updated with new users.
Jan 1 2024, 1:15 PM · OABot

Dec 25 2023

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

https://en.wikipedia.org/w/index.php?title=Lyman_E._Johnson&diff=prev&oldid=1191724248 was not supposed to happen as the existing URL returns a PDF.

Dec 25 2023, 10:37 AM · OABot

Dec 23 2023

Nemo_bis awarded T190129: Consolidate language metadata into a 'language-data' library and use in MediaWiki a Love token.
Dec 23 2023, 1:50 PM · Librarization, Language-Team (Language-2022-January-March), Language codes, TechCom-RFC, Epic, I18n

Dec 19 2023

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

Latest run

Dec 19 2023, 9:14 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Still room for improvement

Dec 19 2023, 9:13 PM · OABot

Dec 10 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Some doi-access=free being re-added now:

$ find -maxdepth 1 -type f -print0 | xargs -0 -P16 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=free")) | .orig_string' | grep doi | grep -Eo 'doi *= *[^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40                                                                                                                                
    546 10.1146/annurev
    409 10.1007/s
    186 10.4202/app.
    178 10.1016/j.
    176 10.1016/j.cub
    156 10.1126/science.
    124 10.1038/s
     96 10.1016/j.cretres
     84 10.1111/pala.
     78 10.1017/jpa.
     72 10.1074/jbc.
     66 10.1002/ar.
     61 10.5252/geodiversitas
     56 10.11646/zootaxa.
     52 10.5852/ejt.
     52 10.5852/cr
     52 10.1016/j.palaeo
     52 10.1002/spp
     48 10.1016/j.jhevol
     46 10.1093/zoolinnean
     44 10.5962/bhl.part
     44 10.1111/j.
     42 10.1016/s
     41 10.3140/bull.geosci
     39 10.1016/j.cell
     39 10.1002/ajb
     38 10.4049/jimmunol.
     38 10.1017/pab.
     33 10.1038/nature
     32 10.1111/j.1475-4983
     31 10.37828/em.
     31 10.1093/mnras
     28 10.1111/j.1096-3642
     27 10.5962/p.
     27 10.2476/asjaa.
     25 10.7203/sjp.
     25 10.1016/j.revpalbo
     23 10.1002/ajpa.
     21 10.24425/agp.
     21 10.1093/bioinformatics
Dec 10 2023, 6:56 PM · OABot
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

Currently with some 160k pages found:

$ find -maxdepth 1 -type f -print0 | xargs -0 -P8 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("subscription")) | .orig_string' | grep -Eo '\| *url *= *http[^|}]+' | cut -d/ -f3 | sort | uniq -c | sort -nr | head -n 30
  15725 www.jstor.org
  14451 dx.doi.org
  12927 doi.org
   9520 www.sciencedirect.com
   6442 www.researchgate.net
   5630 www.tandfonline.com
   5491 onlinelibrary.wiley.com
   4498 www.cambridge.org
   3824 pubmed.ncbi.nlm.nih.gov
   3477 link.springer.com
   3182 muse.jhu.edu
   3024 linkinghub.elsevier.com
   2928 www.nature.com
   2770 journals.sagepub.com
   2065 www.academia.edu
   1934 pubs.acs.org
   1896 academic.oup.com
   1736 www.persee.fr
   1520 www.science.org
   1473 semanticscholar.org
   1247 www.journals.uchicago.edu
   1210 archive.org
   1128 books.google.com
    956 ieeexplore.ieee.org
    854 www.oxforddnb.com
    789 brill.com
    707 doi.wiley.com
    646 www.semanticscholar.org
    620 zenodo.org
    571 www.degruyter.com
Dec 10 2023, 6:50 PM · OABot
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

After a broader run

$ find -maxdepth 1 -type f -print0 | xargs -0 -P8 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("subscription")) | .orig_string' | grep -Eo '\| *url *= *http[^|}]+' |
 cut -d/ -f3 | sort | uniq -c | sort -nr | head -n 30
   3020 dx.doi.org
   2666 www.jstor.org
   2569 doi.org
   2116 www.sciencedirect.com
   1217 www.researchgate.net
   1105 onlinelibrary.wiley.com
   1011 www.tandfonline.com
    822 www.cambridge.org
    789 pubmed.ncbi.nlm.nih.gov
    748 linkinghub.elsevier.com
    685 link.springer.com
    630 www.nature.com
    522 journals.sagepub.com
    453 muse.jhu.edu
    435 pubs.acs.org
    361 www.academia.edu
    351 semanticscholar.org
    341 academic.oup.com
    338 www.science.org
    301 archive.org
    244 www.persee.fr
    210 www.journals.uchicago.edu
    187 books.google.com                                                                                                                                                                                                                   
    180 ieeexplore.ieee.org
    157 pubs.geoscienceworld.org
    150 doi.wiley.com
    149 www.semanticscholar.org
    120 pubs.rsc.org
    119 brill.com
    108 link.aps.org
Dec 10 2023, 11:02 AM · OABot

Dec 8 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Some examples
https://doi.org/10.2307/1378152
https://doi.org/10.2307/3632910
https://doi.org/10.2307/3496680
https://doi.org/10.2307/2324301
https://doi.org/10.2307/2371798

Dec 8 2023, 2:28 PM · OABot
Nemo_bis added a comment to F41575560: Screenshot_20231208_134236.png.

Screenshot_20231208_140022.png (957×1 px, 207 KB)

Screenshot_20231208_140052.png (996×1 px, 128 KB)

Dec 8 2023, 2:27 PM
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

How to sample JSTOR DOIs which look closed:

$ find -maxdepth 1 -type f -print0 | xargs -0 -P8 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=|")) | .orig_string' | grep 2307 | grep -Eo "10.2307/[0-9]+" | sort | shuf -n 40
Dec 8 2023, 11:55 AM · OABot
Nemo_bis set the alternate text for F41575560: Screenshot_20231208_134236.png to doi:10.2307/2987492.
Dec 8 2023, 11:44 AM
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

Currently the most represented domains would be:

$ find -maxdepth 1 -type f -mtime -1 -print0 | xargs -0 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("subscription")) | .orig_string' | grep -Eo '\| *url *= *http[^|}]+' | cut -d/ -f3 | sort | uniq -c | sort -nr | head -n 30
    916 dx.doi.org
    723 www.sciencedirect.com
    658 doi.org
    519 www.jstor.org
    312 onlinelibrary.wiley.com
    292 linkinghub.elsevier.com
    267 www.researchgate.net
    221 www.tandfonline.com
    218 www.cambridge.org
    204 link.springer.com
    182 pubmed.ncbi.nlm.nih.gov
    179 www.nature.com
    152 journals.sagepub.com
    131 pubs.acs.org
    102 www.science.org
     94 academic.oup.com
     93 semanticscholar.org
     87 archive.org
     79 www.academia.edu
     74 pubs.geoscienceworld.org
     55 doi.wiley.com
     54 www.journals.uchicago.edu
     52 pubs.rsc.org
     50 muse.jhu.edu
     49 www.semanticscholar.org
     47 ieeexplore.ieee.org
     43 iopscience.iop.org
     42 link.aps.org
     37 xlink.rsc.org
     35 aip.scitation.org
Dec 8 2023, 7:01 AM · OABot

Dec 7 2023

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

Now looking better https://en.wikipedia.org/w/index.php?title=Thin-film_solar_cell&diff=prev&oldid=1188752862

Dec 7 2023, 1:31 PM · OABot
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

Need to check how many url-access=limited we'd add to non-DOI citations like AdsAbs https://en.wikipedia.org/w/index.php?title=T_Scorpii&diff=prev&oldid=1188735108

Dec 7 2023, 10:10 AM · OABot
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

We should not replace an existing url-access with another for the same URL as happened https://en.wikipedia.org/w/index.php?title=Soft_skills&diff=prev&oldid=1188731807 (even though I'd argue the archive.org inlibrary items are more "limited" than "registration").

Dec 7 2023, 9:28 AM · OABot

Dec 3 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

I've manually deleted the older suggestions so now the numbers will be lower.

find ~/www/python/src/bot_cache -mtime +3 -delete
Dec 3 2023, 11:27 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Some ISSNs

$ find ~/www/python/src/bot_cache -type f -exec jq '.proposed_edits | .[] | .orig_string' {} \; | grep issn | grep -Eo 'issn *= *[0-9-]{8,9}' | grep -Eo '[0-9-]{8,9}' | sort | uniq -c | sort -nr | head -n 40
     87 0036-8075
     46 0004-637
     45 1476-4687
     45 0004-6256
     39 0191-2917
     39 0098-7484
     33 0028-0836
     28 1044-0305
     25 0067-0049
     24 0080-4606
     24 0021-8693
     19 2156-2202
     19 1396-0466
     18 1538-4365
     17 0148-0227
     17 0031-4005
     17 0022-0949
     16 0950-9232
     16 0304-3975
     16 0278-2715
     16 0140-6736
     16 0035-8711
     16 0028-646
     16 0002-7294
     15 1944-8007
     15 1538-4357
     15 0301-4223
     15 0031-949
     15 0006-3568
     15 0003-9926
     14 2330-4804
     14 1475-4983
     14 0271-5333
     13 0272-4634
     13 0097-3165
     13 0080-4630
     12 2515-5172
     12 1631-0683
     12 1364-5021
     12 0094-8276
Dec 3 2023, 11:19 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Or to catch some more ISSN:

$ find ~/www/python/src/bot_cache -type f -exec jq '.proposed_edits | .[] | .orig_string' {} \; | grep doi= | grep -Eo 'doi *=[^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+\b(\.?([a-z]{,8}|[0-9-]{8,9})\b)?' | sort | uniq -c | sort -nr | head -n 4
    390 10.1126/science.
    260 10.1001/jama.
    244 10.1074/jbc.
    235 10.1038/sj.onc
    155 10.1098/rsbm.
    116 10.1098/rstb.
    111 10.1525/aa.
    110 10.1098/rspa.
    104 10.1242/jeb.
    104 10.1111/j.
    100 10.5210/fm.
    100 10.1377/hlthaff.
     99 10.1016/j.
     91 10.1098/rstl.
     86 10.1093/mnras
     74 10.1242/jcs.
     68 10.1167/iovs.
     68 10.1001/archinte.
     62 10.1542/peds.
     61 10.1111/j.1469-8137
     60 10.1098/rsta.
     57 10.1111/j.1558-5646
     55 10.1001/archneur.
     53 10.1111/j.1096-3642
     52 10.1001/archpsyc.
     48 10.3732/ajb.
     46 10.1002/art.
     43 10.1038/sj.mp
     43 10.1016/j.febslet
     42 10.1093/hmg
     41 10.1111/j.1432-1033
     41 10.1016/j.jacc
     40 10.1093/acrefore
     40 10.1001/archopht.
     39 10.1098/rspb.
     39 10.1093/molbev
     38 10.1001/archpedi.
     37 10.1242/dev.
     37 10.1111/j.1475-4983
     36 10.1016/j.jasms
Dec 3 2023, 10:46 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Some of the most common DOI segments slated for doi-access=free removal in today's run:

$ find ~/www/python/src/bot_cache -type f -exec jq '.proposed_edits | .[] | .orig_string' {} \; | grep doi= | grep -Eo 'doi *=[^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40
    392 10.1126/science.
    351 10.1074/jbc.
    260 10.1001/jama.
    236 10.1038/sj.onc
    209 10.1007/s
    176 10.1016/s
    173 10.1038/s
    155 10.1098/rsbm.
    147 10.1146/knowable
    139 10.1038/d
    116 10.1098/rstb.
    111 10.1525/aa.
    110 10.1098/rspa.
    104 10.1242/jeb.
    104 10.1111/j.
    100 10.5210/fm.
    100 10.1377/hlthaff.
     99 10.1016/j.
     91 10.1098/rstl.
     86 10.1093/mnras
     76 10.1242/jcs.
     75 10.1167/iovs.
     68 10.1001/archinte.
     62 10.1542/peds.
     61 10.1111/j.1469-8137
     60 10.1098/rsta.
     57 10.1111/j.1558-5646
     55 10.1001/archneur.
     53 10.1111/j.1096-3642
     52 10.1001/archpsyc.
     48 10.3732/ajb.
     46 10.1038/nature
     46 10.1002/art.
     45 10.1038/sj.mp
     43 10.1016/j.febslet
     42 10.1111/j.1432-1033
     42 10.1093/hmg
     41 10.1016/j.jacc
     41 10.1007/bf
     40 10.1093/acrefore
Dec 3 2023, 10:27 PM · OABot
Nemo_bis added a comment to T141490: Deploy improved FancyCaptcha.

You can look at effect of captcha on known-human users (e.g. IPs from some insitutional range)

Dec 3 2023, 9:04 PM · User-notice-archive, MW-1.42-notes (1.42.0-wmf.14; 2024-01-16), ConfirmEdit (CAPTCHA extension), Security, Wikimedia-Site-requests
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

And currently

$ find ~/www/python/src/cache -type f -exec jq '.proposed_edits | .[] | .orig_string' {} \; | grep url= | grep -Eo 'url=[^"|]+' | cut -d/ -f3 | sort | uniq -c | sort -nr | head -n 40     
   1427 doi.org
   1229 dx.doi.org
   1180 www.sciencedirect.com
    940 www.jstor.org
    875 web.archive.org
    736 onlinelibrary.wiley.com
    606 www.researchgate.net
    591 www.nature.com
    586 www.tandfonline.com
    408 www.cambridge.org
    376 archive.org
    337 link.springer.com
    328 linkinghub.elsevier.com
    310 www.escholarship.org
    302 journals.sagepub.com
    283 www.academia.edu
    265 academic.oup.com
    261 pubmed.ncbi.nlm.nih.gov
    259 www.biodiversitylibrary.org
    244 books.google.com
    238 www.science.org
    224 babel.hathitrust.org
    220 zenodo.org
    212 nrs.harvard.edu
    184 ieeexplore.ieee.org
    177 digitalcommons.law.yale.edu
    176 www.journals.uchicago.edu
    166 urn.kb.se
    164 pubs.acs.org
    123 www.bioone.org
    118 nbn-resolving.de
    117 philarchive.org
    110 muse.jhu.edu
    110 link.aps.org
    105 www.research.manchester.ac.uk
    100 bioone.org
     87 www.aeaweb.org
     86 www.osti.gov
     79 pubs.rsc.org
     77 dspace.lboro.ac.uk
Dec 3 2023, 7:19 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

I made reports upstream for Journal of Biological Chemistry (already fixed), Journal of Asian Studies/Duke University Press, Annual Review of Public Health, AAS journals, AME journals. I manually removed their doi-access=free removals in the queue (they were around 10 % of the total, I think, including all 10.1146/annurev DOIs some of which are not open yet).

Dec 3 2023, 4:52 PM · OABot
Nemo_bis created P54059 DOI prefix 10.4103.
Dec 3 2023, 4:33 PM · OABot

Nov 30 2023

Nemo_bis updated the task description for T196255: Do not take existing URL or identifier for granted.
Nov 30 2023, 11:35 AM · OABot
Nemo_bis updated the task description for T352405: Set url-access field for citations with automatically generated URLs from CrossRef DOIs.
Nov 30 2023, 11:34 AM · Citoid
Nemo_bis created T352405: Set url-access field for citations with automatically generated URLs from CrossRef DOIs.
Nov 30 2023, 11:32 AM · Citoid

Nov 29 2023

Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

The most popular domains to be replaced can be found with

$ find ~/www/python/src/cache -type f -exec jq '.proposed_edits | .[] | .orig_string' {} \; | grep url= | grep -Eo 'url=[^"|]+' | cut -d/ -f3 | sort | uniq -c | sort -nr | head -n 40
   1110 doi.org
    940 dx.doi.org
    893 www.sciencedirect.com
    724 www.jstor.org
    639 web.archive.org
    571 onlinelibrary.wiley.com
    469 www.researchgate.net
    451 www.tandfonline.com
    444 www.nature.com
    316 www.cambridge.org
    277 link.springer.com
    259 linkinghub.elsevier.com
    259 archive.org
    227 www.escholarship.org
    210 journals.sagepub.com
    197 academic.oup.com
    196 www.academia.edu
    192 pubmed.ncbi.nlm.nih.gov
    191 www.biodiversitylibrary.org
    184 books.google.com
Nov 29 2023, 11:15 PM · OABot

Nov 28 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

I opened a PR upstream, https://github.com/ourresearch/oadoi/pull/141#issuecomment-1830788674

Nov 28 2023, 9:38 PM · OABot

Nov 20 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Most common prefixes of DOIs which would be removed:

Nov 20 2023, 5:33 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

There are currently about 20k edits in the queue which would remove a doi-access=true parameter (but currently are doing nothing).

Nov 20 2023, 2:10 PM · OABot

Sep 26 2023

Krinkle awarded T201038: Link to RSS feeds missing on main page a Orange Medal token.
Sep 26 2023, 10:34 PM · wikimediafoundation.org

Sep 9 2023

Nemo_bis added a comment to T345966: Link rot in reading list Google web store description.

https://foundation.wikimedia.org/wiki/Wikipedia_Reading_Lists_Browser_Extension_Privacy_Policy would work; cf. T200754.

Sep 9 2023, 2:47 AM · Wikipedia-Android-App-Backlog, Reading List Service

Aug 27 2023

Nemo_bis added a comment to T283717: Add PMC ID even if doi-access=free.

Comment on doi.org links: https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:OABOT&diff=prev&oldid=1172247256 .

Aug 27 2023, 5:29 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.
........Equity_premium_puzzle
......Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/data/project/oabot/www/python/src/app.py", line 218, in get_proposed_edits
    filtered = list([e for e in all_templates if e.proposed_change])
  File "/data/project/oabot/www/python/src/app.py", line 218, in <listcomp>
    filtered = list([e for e in all_templates if e.proposed_change])
  File "/data/project/oabot/www/python/src/oabot/main.py", line 387, in add_oa_links_in_references
    edit.propose_change(only_doi)
  File "/data/project/oabot/www/python/src/oabot/main.py", line 118, in propose_change
    link, oa_status = get_oa_link(paper=dissemin_paper_object, doi=doi, only_unpaywall=only_doi)
  File "/data/project/oabot/www/python/src/oabot/main.py", line 337, in get_oa_link
    if 'citeseerx.ist.psu.edu' in resp['best_oa_location']['url_for_landing_page']:
TypeError: argument of type 'NoneType' is not iterable
Aug 27 2023, 1:32 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.
.List_of_topics_characterized_as_pseudoscience
.................Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/data/project/oabot/www/python/src/app.py", line 218, in get_proposed_edits
    filtered = list([e for e in all_templates if e.proposed_change])
  File "/data/project/oabot/www/python/src/app.py", line 218, in <listcomp>
    filtered = list([e for e in all_templates if e.proposed_change])
  File "/data/project/oabot/www/python/src/oabot/main.py", line 387, in add_oa_links_in_references
    edit.propose_change(only_doi)
  File "/data/project/oabot/www/python/src/oabot/main.py", line 118, in propose_change
    link, oa_status = get_oa_link(paper=dissemin_paper_object, doi=doi, only_unpaywall=only_doi)
  File "/data/project/oabot/www/python/src/oabot/main.py", line 368, in get_oa_link
    return url, resp['oa_status']
UnboundLocalError: local variable 'resp' referenced before assignment
Aug 27 2023, 12:39 PM · OABot
Nemo_bis added a comment to T345041: TimeoutError with PDF retrieval from some repositories.

Some repositories are inevitably stricter than others and will block us, there's little we can do about it. However,

  1. We could reduce the occurrence by shifting more requests to the user's browser, as we used to do for all HTTPS URLs. (This could cause more previews to become downloads instead.)
  2. We must ensure that the main web tool page loads even if the preview fails.
Aug 27 2023, 12:25 PM · OABot
Nemo_bis created T345041: TimeoutError with PDF retrieval from some repositories.
Aug 27 2023, 12:14 PM · OABot
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

That should be fixed https://github.com/dissemin/oabot/pull/90

Aug 27 2023, 12:06 PM · OABot
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

A bug in the current version

Aug 27 2023, 11:56 AM · OABot

Aug 24 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

34 more examples which seem bronze OA from my manual check, out of 71 OAbot found Unpaywall says are closed (the rest I mostly couldn't verify).

Aug 24 2023, 5:16 PM · OABot
Nemo_bis closed T263453: Allow to add URL to the url parameter even when taken, a subtask of T196255: Do not take existing URL or identifier for granted, as Resolved.
Aug 24 2023, 6:43 AM · OABot
Nemo_bis closed T263453: Allow to add URL to the url parameter even when taken as Resolved.

This is now possible: https://en.wikipedia.org/w/index.php?title=User%3ANemo_bis%2FSandbox&diff=1171974747&oldid=1171974706

Aug 24 2023, 6:43 AM · OABot
Nemo_bis added a subtask for T196255: Do not take existing URL or identifier for granted: T263453: Allow to add URL to the url parameter even when taken.
Aug 24 2023, 6:38 AM · OABot
Nemo_bis added a parent task for T263453: Allow to add URL to the url parameter even when taken: T196255: Do not take existing URL or identifier for granted.
Aug 24 2023, 6:38 AM · OABot
Nemo_bis added a subtask for T195441: Maintenance of existing links, including CiteSeerX: T344114: Remove doi-access=free when Unpaywall no longer confirms it.
Aug 24 2023, 6:38 AM · OABot
Nemo_bis added a parent task for T344114: Remove doi-access=free when Unpaywall no longer confirms it: T195441: Maintenance of existing links, including CiteSeerX.
Aug 24 2023, 6:38 AM · OABot

Aug 23 2023

Nemo_bis added a comment to T299580: Switch OAbot to Python3.

The 400+ edits today (with about 1300 more cached) were the outcome of a regularly scheduled oabot refresh which took about 57 hours to prefill at 10 parallel threads. With one thread it would presumably take at least 3 weeks, but perhaps a monthly update is enough. I'll revisit the multiprocessing after the next run.

Aug 23 2023, 9:34 PM · OABot

Aug 22 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

More examples:

Aug 22 2023, 6:04 AM · OABot

Aug 21 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

There are a few false negatives for AAS DOIs available from ADS or elsewhere, like 10.1086/116239, 10.1086/301260, 10.1086/342942, 10.1088/0067-0049/180/1/117 .

Aug 21 2023, 2:36 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

10.1046/j.1471-4159.1999.0730012.x is an example of paper with a deceitful "free access" label but which can hardly qualify even as gratis/bronze OA given download and printing aren't permitted. Hence the removal was appropriate.

Aug 21 2023, 1:57 PM · OABot

Aug 18 2023

Nemo_bis added a comment to T157529: Accept GNU Taler for donations.

I've just talked with Grothoff at Chaos Communication Camp about it. Apparently payments will start being possible in Switzerland within 12 months (with regulatory approval etc.) and then be extended to the rest of the eurozone. The donation target has set up an account with a Taler processor to receive payments and regularly withdraw the cash to a bank account IBAN. https://taler.net/en/faq.html

Aug 18 2023, 9:51 AM · Upstream, Fundraising-Backlog, MediaWiki-extensions-DonationInterface
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Now we're correctly using Unpaywall data but still the detection of bronze papers seems to have deteriorated somewhat. Problematic DOIs include the bogus epaper ones but also news/obituaries/editorials/random content types which don't have any paywalled content but may also not offer a gratis full text PDF.

Aug 18 2023, 7:48 AM · OABot

Aug 17 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Also an oa_status bronze DOI, perhaps the URL aanda.org is ignored. https://en.wikipedia.org/w/index.php?title=Psammophyte&diff=prev&oldid=1170571039

Aug 17 2023, 10:48 PM · OABot

Aug 15 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Strange case of removal of a gold OA DOI: https://en.wikipedia.org/w/index.php?title=Psammophyte&diff=prev&oldid=1170571039

Aug 15 2023, 9:34 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

https://en.wikipedia.org/w/index.php?title=Agriculture_in_Taiwan&diff=1170568092&oldid=1170224661 is a Wiley article which was marked OA back in 2020 and is now considered closed by Unpaywall, though it seems to be bronze OA once you get past the nasty ereader.

Aug 15 2023, 9:02 PM · OABot
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

Can probably make the diff less noisy than https://en.wikipedia.org/w/index.php?title=LGBT_rights_in_Cameroon&curid=3995945&diff=1170361053&oldid=1170360928 , though it's not a big deal to add a few URL-related parameters.

Aug 15 2023, 8:37 PM · OABot
Nemo_bis added a comment to T196255: Do not take existing URL or identifier for granted.

Example of URL replacement https://en.wikipedia.org/w/index.php?title=User:Iamojo/testcase/EasterIsland&curid=74566116&diff=1170384670&oldid=1170381645

Aug 15 2023, 8:35 PM · OABot

Aug 13 2023

Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Example edit from the bot https://en.wikipedia.org/w/index.php?title=User:Nemo_bis/Sandbox&diff=prev&oldid=1170242061

Aug 13 2023, 10:50 PM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

https://github.com/dissemin/oabot/pull/89 with example edit https://en.wikipedia.org/w/index.php?title=User%3ANemo_bis%2FSandbox&diff=1170209844&oldid=1170209687

Aug 13 2023, 7:09 PM · OABot
Nemo_bis added a comment to T299580: Switch OAbot to Python3.

I'm not sure why I thought multiprocessing wasn't available. I've restored it in https://github.com/dissemin/oabot/pull/88 with slightly different settings.

Aug 13 2023, 8:04 AM · OABot
Nemo_bis added a comment to T344114: Remove doi-access=free when Unpaywall no longer confirms it.

Discussed at https://en.wikipedia.org/w/index.php?title=User_talk:OAbot&diff=1170118634&oldid=1170037606

Aug 13 2023, 7:54 AM · OABot
Nemo_bis added a parent task for T344114: Remove doi-access=free when Unpaywall no longer confirms it: T196255: Do not take existing URL or identifier for granted.
Aug 13 2023, 7:42 AM · OABot
Nemo_bis added a subtask for T196255: Do not take existing URL or identifier for granted: T344114: Remove doi-access=free when Unpaywall no longer confirms it.
Aug 13 2023, 7:42 AM · OABot
Nemo_bis triaged T344114: Remove doi-access=free when Unpaywall no longer confirms it as High priority.
Aug 13 2023, 7:42 AM · OABot
Nemo_bis created T344114: Remove doi-access=free when Unpaywall no longer confirms it.
Aug 13 2023, 7:41 AM · OABot

Aug 12 2023

Nemo_bis closed T344094: bot.py terminates upon pywikibot.exceptions.LockedPageError as Resolved.

Fixed in https://github.com/dissemin/oabot/pull/87

Aug 12 2023, 11:22 AM · OABot