User Details
- User Since
- Jun 4 2016, 1:17 PM (414 w, 4 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- GreenC [ Global Accounts ]
Fri, May 10
It is being done entirely for the benefit on Wikipedia, as explained above. Not because the site owner requested it. The site owner suggested this after I asked what they thought would be best, for the benefit of Wikipedia. The site owner could care less what we do, it does not matter to them. It's not a branding issue, if that is your concern.
Apr 4 2024
To add, the owners of archive.today requested we at Wikipedia use archive.today as a safeguard against potential future outages of the other domains, such as archive.is, which have in the past been subject to problems.
This is even worse:
To be honest, this should be disallowed:
Apr 3 2024
It's not cosmetic. Archive.today uses 5 or 6 domains such as .is .ph, etc.. these are where the content is located. The domain archive.today does not contain content, it's only function is to route requests to one of the other domains. Thus if archive.is becomes unavailable, the admins only need to reroute traffic to archive.ph .. however if the incoming link is archive.is then it won't function. Note that "becomes unavailable" is a domain-level outage, because this archive provider has had problems with domain registars and/or DNS resolvers that blackhole it for policy reasons.
Mar 20 2024
A domain is dead. The bot didn't do anything.
Mar 11 2024
Feb 5 2024
Feb 4 2024
Jan 28 2024
Jan 6 2024
Jan 5 2024
Jan 3 2024
We still have over 1.2 million instances of webcitation.org .. half are on two sites ruwiki (441k) and ukwiki (216k). Other large sites include Chechnya (64k) and Armenia (50k). It's a bigger problem for them. Enwiki is mostly converted, only 37k left, there used to be many 100s of thousands. Every month the totals drop a little (all wikis last month -27k). In terms of enwiki and reFill I would skip processing when encountered.
Dec 9 2023
Hi, my tools here are awk and tcsh scripts mostly, that further invoke other unix tools like sort, uniq etc.. that also depend on a library I wrote called BotWikiAwk which has environment variable and other awk scripts.
Nov 8 2023
If you need a testcases page let me know.
Aug 17 2023
There are 14 links on Enwiki and Jpwiki has over 1,000
Aug 12 2023
Aug 1 2023
Jul 19 2023
Per my duplicate task.. it's might be too much for the bot to check every http request two times.. but if we know an entire domain has moved to https, then it should be possible to include an optional feature in the interface to tell the bot it should check a URL for https if it's getting a 404 on http. If it gets a 200 on the https, then change the source URL to https and don't add an archive. It could go further by checking existing citations with url-status=dead and converting to live, but this is probably beyond the bot's scope right now.
Jul 12 2023
Jul 10 2023
It's unclear if IABot is still adding to discussion pages. I see in the above example the last one was dated December 2022. Current edits by the bot are not adding to discussion pages. This on Italian wiki. On enwiki is stopped many years ago.
Jun 13 2023
Bots are not really "adding" blacklisted links, the links were already there. We can continue to block bots from doing their work, but the consequences IMO are much more severe than the edge scenario outlined above.
May 2 2023
Apr 9 2023
Dec 17 2022
Oct 27 2022
Oct 19 2022
Until resolved, you can add {{cbignore}} to keep the bot off the citation. That's what I did for the few I added. BTw ghostarchive.org uses the same Webrecorder technology on the backend. It won't work in every case as a substitute for Conifer, but many it will.
Sep 21 2022
Sep 19 2022
Sep 11 2022
Sep 10 2022
Aug 27 2022
Aug 19 2022
Aug 11 2022
My workaround is compare the date of the diff (via MW API) with the date in the JSON and if they are too far apart assume the JSON is buggy data, ignore and log it. There is a massive log, now.
Aug 3 2022
May 17 2022
Thanks @bd808 - tool deletion request: https://phabricator.wikimedia.org/T308587
May 16 2022
Hi, I would like to delete the tool entirely. If someone wants to have this service, it would best to start over with a new account and install.
Mar 28 2022
This is a hard problem as the status can flip back and forth. As noted by Mark Graham above (Director Wayback Machine) archives that show excluded is like a curtain, the archive still exists in the Wayback Machine and could flip back to active in the future based on policy decision. The reason they are being added into wiki anyway is because IABot has a separate cache database and when it first detected that URL it was active. As a friend recently noted, one of the hardest things in computing is keeping accurate caches. The design of IABot is to use caching and not querying the WaybackMachine for every URL it encounters, which has pros and cons.
Mar 16 2022
Mar 14 2022
Testcases page: https://en.wikipedia.org/wiki/User:GreenC/testcases/ghostarchive
Feb 2 2022
Block 12946111 https://en.wikipedia.org/wiki/Special:BlockList?wpTarget=%2312946111&blockType=&limit=50&wpFormIdentifier=blocklist is for User:Lallint .. confirming they have used the IABot tool before, last active 22:45 30 January 2022
Jan 18 2022
Jan 16 2022
Jan 12 2022
Bot is converting short-form archive.today to Wayback links:
https://en.wikipedia.org/w/index.php?title=Elizabeth_Holmes&type=revision&diff=1065246449&oldid=1065173325
Dec 21 2021
Dec 20 2021
Dec 19 2021
Nov 9 2021
This is awesome. Confirming deployed AntiCompositeNumber's shadows.py to produce the list and the GreenC bot Job 10 ("shadows.awk") is back running, having just tagged 25 pages .
Sep 7 2021
I don't remember what prompted creation of this ticket. I'll make a new ticket if seen again, with an example.
Sep 2 2021
No idea. Feel free to adjust for the right audience I wasn't sure.
{"$schema":"/mediawiki/page/links-change/1.0.0","meta":{"uri":"https://arz.wikipedia.org/wiki/%D8%B1%D9%88%D8%AF%D9%8A%D9%88%D9%85","request_id":"bc403e26b8b72080c369aa66","id":"26739413-d570-4363-af37-af690a94f501","dt":"2021-09-01T23:30:50Z","domain":"arz.wikipedia.org","stream":"mediawiki.page-links-change","topic":"codfw.mediawiki.page-links-change","partition":0,"offset":203083041},"database":"arzwiki","page_id":1389768,"page_title":"روديوم","page_namespace":0,"page_is_redirect":false,"rev_id":5641431,"performer":{"user_text":"InternetArchiveBot","user_groups":["bot","*","user","autoconfirmed"],"user_is_bot":true,"user_id":142851,"user_registration_dt":"2020-12-18T16:05:11Z","user_edit_count":20253},"added_links":[{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_BNF","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_GND","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_LCCN","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_LNB","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:%25D9%2585%25D9%2582%25D8%25A7%25D9%2584%25D8%25A7%25D8%25AA_%25D9%2581%25D9%258A%25D9%2587%25D8%25A7_%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581%25D8%25A7%25D8%25AA_NDL","external":false},{"link":"/wiki/%25D8%25AA%25D8%25B5%25D9%2586%25D9%258A%25D9%2581:CS1_maint:_uses_authors_parameter","external":false},{"link":"/wiki/International_Standard_Book_Number","external":false},{"link":"/wiki/National_Library_of_Latvia","external":false},{"link":"/wiki/Oxford_University_Press","external":false},{"link":"/wiki/%25D9%2585%25D9%2583%25D8%25AA%25D8%25A8%25D8%25A9_%25D8%25A7%25D9%2584%25D9%258A%25D8%25A7%25D8%25A8%25D8%25A7%25D9%2586_%25D8%25A7%25D9%2584%25D9%2588%25D8%25B7%25D9%2586%25D9%258A%25D9%2587","external":false},{"link":"/wiki/%25D9%2585%25D9%2583%25D8%25AA%25D8%25A8%25D8%25A9_%25D9%2581%25D8%25B1%25D9%2586%25D8%25B3%25D8%25A7_%25D8%25A7%25D9%2584%25D9%2588%25D8%25B7%25D9%2586%25D9%258A%25D9%2587","external":false},{"link":"/wiki/%25D9%2585%25D9%2584%25D9%2581_%25D8%25A7%25D8%25B3%25D8%25AA%25D9%2586%25D8%25A7%25D8%25AF%25D9%2589_%25D9%2585%25D8%25AA%25D9%2583%25D8%25A7%25D9%2585%25D9%2584","external":false},{"link":"/wiki/%25D9%2586%25D9%2585%25D8%25B1%25D8%25A9_%25D8%25AA%25D8%25AD%25D9%2583%25D9%2585_%25D9%2585%25D9%2583%25D8%25AA%25D8%25A8%25D8%25A9_%25D8%25A7%25D9%2584%25D9%2583%25D9%2588%25D9%2586%25D8%25AC%25D8%25B1%25D8%25B3","external":false},{"link":"/wiki/Hamish_Hamilton_Ltd","external":false},{"link":"/wiki/%25D9%2585%25D8%25B9%25D8%25B1%25D9%2581_%25D8%25A7%25D9%2584%25D8%25BA%25D8%25B1%25D8%25B6_%25D8%25A7%25D9%2584%25D8%25B1%25D9%2582%25D9%2585%25D9%2589","external":false},{"link":"/wiki/%25D9%2585%25D8%25B3%25D8%25A7%25D8%25B9%25D8%25AF%25D8%25A9:CS1_errors","external":false},{"link":"https://www.wikidata.org/wiki/Q1087","external":true},{"link":"https://commons.wikimedia.org/wiki/Category:Rhodium","external":true},{"link":"https://www.quora.com/topic/Rhodium-1","external":true},{"link":"https://www.google.com/search%3Fkgmid%3D/m/025scm0","external":true},{"link":"https://catalogue.bnf.fr/ark:/12148/cb12218903f","external":true},{"link":"https://academic.microsoft.com/v2/detail/521398313","external":true},{"link":"https://academic.microsoft.com/v2/detail/2910290644","external":true},{"link":"https://id.loc.gov/authorities/sh85113755","external":true},{"link":"https://kopkatalogs.lv/F/%3Ffunc%3Ddirect%26local_base%3Dlnc10%26doc_number%3D000307942","external":true},{"link":"https://d-nb.info/gnd/4178038-3","external":true},{"link":"https://archive.org/details/naturesbuildingb0000emsl","external":true},{"link":"https://archive.org/details/elementsvisualex0000gray","external":true},{"link":"https://archive.org/details/periodictableits0000scer","external":true},{"link":"//doi.org/10.1351%252Fgoldbook","external":true},{"link":"//doi.org/10.1351%252Fgoldbook","external":true},{"link":"https://data.bnf.fr/ark:/12148/cb12218903f","external":true},{"link":"https://id.loc.gov/authorities/subjects/sh85113755","external":true},{"link":"https://kopkatalogs.lv/F%3Ffunc%3Ddirect%26local_base%3Dlnc10%26doc_number%3D000307942%26P_CON_LNG%3DENG","external":true},{"link":"https://id.ndl.go.jp/auth/ndlna/00569786","external":true}]}
Aug 13 2021
Domain now blacklisted - could also lift global live state to "none" and let IABot do dead link detection and fix normally but blacklisting will resolve faster.
Aug 7 2021
I will process this domain for dead links, set them blacklisted (locks it in so global live state won't matter), then remove the global whitelist . Takes a while.
Aug 1 2021
Jun 24 2021
it wouldn't need to be realtime up to date, so if the data is a bit stale, like updated weekly or monthly
Jun 6 2021
May 19 2021
May 9 2021
@Amire80 that is a fascinating taxonomy you created. Glad you found this thread :) I'm having trouble understanding some of them as there are no examples. Intended space has an example, for example, so I understand where nowiki appears and might find those eg regex. []]{2}<nowiki/>[[]{2} - in fact there are 8 intended space on enwiki:
If you ever decide to expand the description or a new column to include (more) examples where possible it would be very helpful towards developing tools and reports. It should be possible to detect many of them universally (like intended space) potentially making for a global bot/tool/report across all wikis.
May 8 2021
Apr 20 2021
Apr 6 2021
Hi - I am also having trouble. In one day, it falsely reported over 2,000 edits on ukwiki as made by InternetArchiveBot - not a small number of false positives.
@Samwalton9 that's it. Thank you.
I see it was transcluded in from this edit to a template used on the page: https://ca.wikiquote.org/w/index.php?title=Plantilla%3ARingler&type=revision&diff=130519&oldid=98762
Mar 20 2021
I believe it is an old bug long since fixed. If you see anything like it again it's not fixed, but the bot has edited cswiki a lot since then which is a good sign.
Mar 18 2021
Feb 24 2021
On zhwiki this turned out to be an IP filter.
Feb 8 2021
This is now happening to me on zhwiki as of today.
Feb 1 2021
Reporting same problem. Unable to add maintainers. This is urgent, unable to get a team of developers going on a project. Thanks.