Wikimedia cross-wiki coordination and L10n/i18n. Mainly active on Wikiquote, Wiktionary, Wikisource, Commons, Wikidata, Wikibooks. And of course Meta-Wiki, translatewiki.net.
Contact me by MediaWiki.org email or user talk.
Wikimedia cross-wiki coordination and L10n/i18n. Mainly active on Wikiquote, Wiktionary, Wikisource, Commons, Wikidata, Wikibooks. And of course Meta-Wiki, translatewiki.net.
Contact me by MediaWiki.org email or user talk.
There will be another major Unpaywall update in December, I'll have to test the suggestions again.
Precision down to the microsecond reminds me of xkcd 2710. :-) Some more floor() needed?
In the case of doi:10.11646/phytotaxa.498.3.2 it would have helped to look up the Internet Archive Scholar, because https://scholar.archive.org/fatcat/release/q35yyfg2cfg7vd3awxexwrdyui has it open, so we could have avoided treating it as closed.
Or actually the reason is that there is a proposed edit for url= which would supersede the existing url-access, but we should do either both or neither. For example: "proposed_change": "url-access=|url=https://repositorio.uchile.cl/handle/2250/177034|".
There's something wrong with the new logic, sometimes it sets url-access to empty for no reason https://en.wikipedia.org/w/index.php?title=Hurdia&diff=prev&oldid=1296401148
The most common domain names which would be marked as url-access subscription:
Currently the most often replaced domains would be
Some of the most common DOI prefixes now:
The InternetArchiveBot does not remove broken and redundant URLs from {{cite journal}}, so it does not help much. Yes it's a bug, and it's already fixed, but that doesn't help fix the citation.
In my own editing session I found 24 correct suggestions to remove doi-access=free, and I rejected over 50:
Hopefully fixed. PMC matching has changed in various ways in the new Unpaywall, so any bugs need to be considered afresh.
Thanks for the report. Ultimately the only solution in such a case is to remove the garbage links. Other than that, this should be fixed as of https://phabricator.wikimedia.org/T395086#10916113.
More interesting cases
We could track how many proposed edits get rejected over time. However there are very few editors who consistently reject edits, so it probably wouldn't be better than me just sampling, say, 50 proposed edits every quarter...
Checking the domain name does not work so well with Elsevier URL which happen to redirect to a repository, like doi:10.1016/j.crci.2007.02.011 which goes to https://comptes-rendus.academie-sciences.fr/chimie/articles/10.1016/j.crci.2007.02.011/
Should be fixed in https://github.com/dissemin/oabot/pull/93
First need to fix a few spurious test failures
........F.F.F.......F.F.F.....F.F...F....
======================================================================
FAIL: test_add_arxiv (tests.templateedit.TemplateEditTests.test_add_arxiv)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/federico/mw/oabot/src/tests/templateedit.py", line 22, in test_add_arxiv
self.assertEqual("arxiv=1804.09042", edit.proposed_change)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 'arxiv=1804.09042' != 'arxiv=1804.09042|'
- arxiv=1804.09042
+ arxiv=1804.09042|
? +In some cases there is a PDF even though it is not linked: https://cris.maastrichtuniversity.nl/en/publications/2f0126d4-d86c-44c9-8a9e-0b39c68ed21e has https://cris.maastrichtuniversity.nl/files/73014428/merrienboer_2014_how_experts_deal_with_novel.pdf for doi:10.1016/j.edurev.2014.03.00 (https://en.wikipedia.org/?diff=1294114412).
The issue again stems from a Pure repository. What a blight on earth. Perhaps we should just ignore all suggestions coming from Pure.
(Yes it's about time we move this to gitlab.wikimedia.org or other, but it still depends on https://github.com/dissemin/wikiciteparser .)
Se noti, alla pagina https://www.hetzner.com/european-cloud Hetzner di fatto ammette questo:
Hetzner fornisce anche servizi in USA e quindi forse avrà personale in loco, ma da quanto mi risulta tutti i dati in EU sono trattai da personale e regole EU.
La privacy policy autorizza il trasferimento di dati personali verso gli USA, determinato dal fatto che Hetzner ha impiegati negli USA ed è quindi soggetto al CLOUD Act? In caso contrario, questo server non dovrà essere usato per trattare dati personali (quali per esempio registri di accesso di visitatori del sito di WMI).
Nowadays we only use Unpaywall and the number of ResearchGate or Academia.edu suggestions is negligible.
Given how many years it has taken us to babysit OAbot on the English Wikipedia to do a fraction of what originally envisioned, I'm starting to wonder whether this should instead be done with an extension, similar to the SecureLinkFixer extension. After all, sending visitors to the websites of legacy publishers is a clear and present danger. The Unpaywall snapshot could be imported similar to what the Tor extension does, or redirection could be delegated to oadoi.org. Ways can be devised to leave more control to local wikis.
Generally speaking, this has been working fine for a while. Example: https://en.wikipedia.org/w/index.php?title=MIM_Museum&diff=prev&oldid=1291114435
Current most popular DOI prefixes
$ find ~/www/python/src/bot_cache -maxdepth 1 -type f -print0 | xargs -0 -P16 -n1 jq '.proposed_edits|.[]| select(.proposed_change|contains("doi-access=|")) | .orig_string' | grep doi | grep -Eo 'doi *= [^"|]+' | grep -Eo '10\.[0-9]+/[a-z]+(\.([a-z]{,8}|[0-9-]{9})\b)?' | sort | uniq -c | sort -nr | head -n 40 jq: error: Could not open file /data/project/oabot/www/python/src/bot_cache/ISO#IEC_2022.json: No such file or directory parse error: Invalid numeric literal at line 1, column 6 1933 10.1074/jbc. 1194 10.1038/sj.onc 705 10.1126/science. 512 10.1098/rsbm. 396 10.4049/jimmunol. 385 10.1093/hmg 370 10.1111/syen. 304 10.1096/fj. 284 10.1001/jama. 250 10.1242/jcs. 213 10.1096/fasebj. 204 10.11646/zootaxa. 202 10.1182/blood 162 10.1016/j.febslet 138 10.1038/sj.mp 127 10.1182/blood. 111 10.1242/dev. 103 10.1016/s 100 10.1111/j. 100 10.1002/art. 87 10.1210/jcem. 87 10.1167/iovs. 85 10.1111/j.1432-1033 81 10.1093/brain 81 10.1016/j. 80 10.1038/onc. 80 10.1001/archinte. 77 10.1093/humupd 76 10.1038/sj.leu 75 10.1242/jeb. 75 10.1098/rstl. 74 10.1093/mnras 74 10.1002/ijc. 73 10.1001/archneur. 72 10.1007/s 70 10.4269/ajtmh. 70 10.1146/annurev 66 10.1016/j.cell 64 10.1542/peds. 62 10.1124/pr.
Rare cases of removal of url-access=subscription do not seem very useful: https://en.wikipedia.org/w/index.php?title=Economy_of_Russia&diff=prev&oldid=1291484551
The most common proposed changes in the bot queue, currently not acted upon, are:
I think we should decline this for good, since the Wikidata graph split has been completed and the future of WikiCite data on Wikidata remains uncertain.
As rephrased, the issue has been solved.
Thanks for sharing this discussion between those two users. The tool already avoids adding URLs when the doi-access=true is confirmed to be correct (cf. T344114). Links to repositories are added for additional safety when the DOI link appears to be closed.
A [February 2025 RfC](https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(policy)/Archive_201#h-RFC:_Allow_for_bots_(e.g._Citation_bot)_to_remove_redundant_URLs_known_to_not_ho-20250217092500 ) on the English Wikipedia has explicitly endorsed removing PubMed and OCLC URLs which do not provide a full text.
All the cases mentioned above seem now ok in Unpaywall, from some spot checks.
No longer relevant as Dissemin has closed: T394853.
Not sure whether this is still happening.
In the past year, redundant links have grown from 90k to 120k, so clearly this is more necessary than ever...
I've finally merged the PR as oabot has been running that code for over a year without problems now. https://github.com/dissemin/oabot/pull/91
34 more examples which seem bronze OA from my manual check, out of 71 OAbot found Unpaywall says are closed (the rest I mostly couldn't verify).
Thanks for the report and sorry for the annoying experience. Errors about individual DOIs are best reported to Unpaywall directly. The issue has since been fixed, as doi:10.1007/BF02124750 is now considered closed access.
It's true it would be good to link e.g. doi:10.3897/zookeys.43.390 if it weren't linked already, but the bot already does that.
Thanks for the report. Next time please apply the edit and revert it, or include the suggestion citation, or at least mention the DOI. Links to suggestions expire after a few weeks as they get deleted from the cache.
OABot adds URLs to http://pdfs.semanticscholar.org/8775/3fa9d86e28e1fb332f1509f3519e5b3a9c0d.pdf which redirects
The s2cid parameter does not autolink, so it's not a substitute for the url parameter. See also Why does the oabot tool make edits the bot doesn't?.
It seems clear to me that we need a mirror of Wikimedia Commons files. Ideally we would have kept both the media tarballs at your.org and the WikiTeam collection at the Internet Archive up to date, but we've not managed to keep up after 2012 and 2016.
Also, I've tried the link from a recent post and it doesn't even work: it produces an empty post after one and two redirects. It seems nobody is using those links, as nobody noticed.
Another reason to do this is that Facebook doesn't even allow sharing links to some Wikimedia projects.
Thanks for the update on XML data dumps list. I see there's progress on the other side: https://phabricator.wikimedia.org/T382947#10476420 . Hopefully this will allow to re-enable the dumps soon.
IIRC these (and the OAI feeds) were added back in the day when the WMF got some corporate contribution to provide specialised data feeds. I imagine any contractual obligations have long expired (if they even existed), but I don't know who could verify that.
The query itself will remain, so getting fresh results should be nothing more than a submit query away.
By running more tests and using Mann Whitney we know if a performance regression is of statistical significance. That way we can make sure that we only alert on real regressions. That decreases the number of false alerts and time spent investigating regressions.
We certainly don't want to be in the way. Feel free to delete the VMs. I was hoping to double check there's nothing to salvage in the local mounts but usually there shouldn't be anyway.
As an update, I created the account and luckily we were still in time for this round of submissions (CLDR 46). It's always a good time to ask a CLDR account from me! Six months tend to fly by.
Maybe it could be retrieved from a very early dump or some other means
@Hydriz Can I upgrade the VMs to Debian 11 one of these weekends? The only reason not to that I can think of is some scripts may require Python2, but that's still available in Debian 11.
@HShaikh Please don't propagate myths. https://aeon.co/essays/the-tragedy-of-the-commons-is-a-false-and-dangerous-myth
I'm closing this task as unclear and not pertaining to MediaWiki core, mostly because it mixes different user groups and permissions some of which are Wikimedia-specific.
This reminds me a bit of the https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool , which I believe focused on identifying easy concepts like numbers. I've not used it in years.
https://www.mediawiki.org/wiki/Special:RecentChanges?useskin=vector&uselang=ksh after disabling JavaScript recentchanges:
@Mazevedo Here's an example old ticket which may or may not be relevant any more. :)
Do you want to focus on the exonyms in languages which are supported by MediaWiki core (or at least translatewiki.net) but not in CLDR?