Page MenuHomePhabricator

Automatic archive for new external links
Closed, ResolvedPublic


This is a tracking ticket for a top 10 proposal in the 2016 Community Wishlist Survey:

Current situation:

Original proposal:
Problem: Web pages disappear and we are left with broken links. Adding a permanent link is more work for editors.

Who would benefit: Editors that use web-based references, users that want to verify claims that use web pages as references, and users that want to learn more about a subject.

Proposed solution: Do the following:

  1. Automatically add a link to the corresponding page in Internet Archive if an url and an access-date is provided in a cite.
  2. Automatically add access-date to cites, if they are not provided, when an edit is saved.
  3. Automatically request archival of web pages in Internet Archive if they are not available there.

Proposer: Aracali (talk) 16:32, 12 November 2016 (UTC)

Project page:

Event Timeline

DannyH created this task.Dec 15 2016, 7:55 PM
Restricted Application added a project: Internet-Archive. · View Herald TranscriptDec 15 2016, 7:55 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Automatically request archival of web pages in Internet Archive if they are not available there.

Not needed on Wikimedia wikis, Internet Archive already did this.

While true it should also be noted that those archived pages are not publicly available. See Wikipedia Outlinks at the Internet Archive.

While true it should also be noted that those archived pages are not publicly available.

This is incorrect.

Allen4names added a comment.EditedDec 21 2016, 7:19 PM

@Nemo_bis To quote from the page I linked above "These files are currently not publicly accessible". Please explain why I am incorrect.

I'm not going to debate random speculations. Facts were clearly stated by Alexis Rossi.

There are several better venues where you can learn more about the Internet Archive and discuss your opinions and guesses before spreading false information:

Nemo_bis updated the task description. (Show Details)Dec 22 2016, 8:03 AM

It would be more helpful to explain the apparent conflict between this blog post at the Internet Archive and the Wikipedia Outlinks page it links to. While the former may imply that the archived pages can be accessed via the Wayback Machine the later makes clear that that collection is not publicly accessible.

Green_Cardamom added a comment.EditedDec 22 2016, 5:40 PM

The Outlinks are (I believe) not the archived pages, rather the list of links to be archived. For whatever reason that list is not publicly available, but it doesn't need to be. It is possible to test if Internet Archive is auto archiving .. add a link that has never been archived before to a wiki page and see what happens (looks like 6hr cycle?). If it doesn't work, let me know I can contact somehow who will look into it.

A few months ago, someone noticed archives not being archived and on contacting IA it was determined the process had quit running (or something) and it was restarted. It really needs a monitoring script - once a week or so. This requires adding a fresh link to Wikipedia every week (into non-main space), which means creating a fresh URL. It can probably be done through tools and cron.

Gestrid added a subscriber: Gestrid.Jan 2 2017, 8:49 PM
Arbnos added a subscriber: Arbnos.Jan 23 2017, 7:29 PM
kaldari updated the task description. (Show Details)Feb 27 2017, 8:47 PM
czar added a subscriber: czar.Mar 14 2017, 6:40 AM
czar added a comment.Jul 18 2017, 6:46 PM

Since it hasn't been mentioned, wanted to note that the new has an option to add archive parameters to a specific page's citations before they have a chance to die. Separately, the IA bot also crawls WP for dead links and adds archive URLs when applicable. The aforementioned interface is good for proactive editors, though.

Cyberpower678 closed this task as Resolved.Aug 23 2019, 6:26 AM
Cyberpower678 claimed this task.

Boldly closing this. Internet Archive actively crawls all WMF projects looking for new links and IABot handles Wikidata.