Automatic archive for new external links
Open, NormalPublic

Description

This is a tracking ticket for a top 10 proposal in the 2016 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/Bots_and_gadgets#Automatic_links_to_Internet_Archive

Current situation:
https://www.mediawiki.org/wiki/Archived_Pages

Original proposal:
Problem: Web pages disappear and we are left with broken links. Adding a permanent link is more work for editors.

Who would benefit: Editors that use web-based references, users that want to verify claims that use web pages as references, and users that want to learn more about a subject.

Proposed solution: Do the following:

  1. Automatically add a link to the corresponding page in Internet Archive if an url and an access-date is provided in a cite.
  2. Automatically add access-date to cites, if they are not provided, when an edit is saved.
  3. Automatically request archival of web pages in Internet Archive if they are not available there.

Proposer: Aracali (talk) 16:32, 12 November 2016 (UTC)

Project page: https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/Bots_and_gadgets#Automatic_links_to_Internet_Archive

DannyH created this task.Dec 15 2016, 7:55 PM
Restricted Application added a project: Internet-Archive. · View Herald TranscriptDec 15 2016, 7:55 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Automatically request archival of web pages in Internet Archive if they are not available there.

Not needed on Wikimedia wikis, Internet Archive already did this. https://www.mediawiki.org/wiki/Archived_Pages

While true it should also be noted that those archived pages are not publicly available. See Wikipedia Outlinks at the Internet Archive.

While true it should also be noted that those archived pages are not publicly available.

This is incorrect.

Allen4names added a comment.EditedDec 21 2016, 7:19 PM

@Nemo_bis To quote from the page I linked above "These files are currently not publicly accessible". Please explain why I am incorrect.

I'm not going to debate random speculations. Facts were clearly stated by Alexis Rossi.

There are several better venues where you can learn more about the Internet Archive and discuss your opinions and guesses before spreading false information:

Nemo_bis edited the task description. (Show Details)Dec 22 2016, 8:03 AM

It would be more helpful to explain the apparent conflict between this blog post at the Internet Archive and the Wikipedia Outlinks page it links to. While the former may imply that the archived pages can be accessed via the Wayback Machine the later makes clear that that collection is not publicly accessible.

Green_Cardamom added a comment.EditedDec 22 2016, 5:40 PM

The Outlinks are (I believe) not the archived pages, rather the list of links to be archived. For whatever reason that list is not publicly available, but it doesn't need to be. It is possible to test if Internet Archive is auto archiving .. add a link that has never been archived before to a wiki page and see what happens (looks like 6hr cycle?). If it doesn't work, let me know I can contact somehow who will look into it.

A few months ago, someone noticed archives not being archived and on contacting IA it was determined the process had quit running (or something) and it was restarted. It really needs a monitoring script - once a week or so. This requires adding a fresh link to Wikipedia every week (into non-main space), which means creating a fresh URL. It can probably be done through tools and cron.

Gestrid added a subscriber: Gestrid.Jan 2 2017, 8:49 PM
Arbnos added a subscriber: Arbnos.Jan 23 2017, 7:29 PM
kaldari edited the task description. (Show Details)Mon, Feb 27, 8:47 PM
czar added a subscriber: czar.Tue, Mar 14, 6:40 AM