Page MenuHomePhabricator

Automatic archive for new external links
Closed, ResolvedPublic

Description

This is a tracking ticket for a top 10 proposal in the 2016 Community Wishlist Survey: https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/Bots_and_gadgets#Automatic_links_to_Internet_Archive

Current situation:
https://www.mediawiki.org/wiki/Archived_Pages

Original proposal:
Problem: Web pages disappear and we are left with broken links. Adding a permanent link is more work for editors.

Who would benefit: Editors that use web-based references, users that want to verify claims that use web pages as references, and users that want to learn more about a subject.

Proposed solution: Do the following:

  1. Automatically add a link to the corresponding page in Internet Archive if an url and an access-date is provided in a cite.
  2. Automatically add access-date to cites, if they are not provided, when an edit is saved.
  3. Automatically request archival of web pages in Internet Archive if they are not available there.

Proposer: Aracali (talk) 16:32, 12 November 2016 (UTC)

Project page: https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/Bots_and_gadgets#Automatic_links_to_Internet_Archive

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Automatically request archival of web pages in Internet Archive if they are not available there.

Not needed on Wikimedia wikis, Internet Archive already did this. https://www.mediawiki.org/wiki/Archived_Pages

While true it should also be noted that those archived pages are not publicly available. See Wikipedia Outlinks at the Internet Archive.

While true it should also be noted that those archived pages are not publicly available.

This is incorrect.

@Nemo_bis To quote from the page I linked above "These files are currently not publicly accessible". Please explain why I am incorrect.

I'm not going to debate random speculations. Facts were clearly stated by Alexis Rossi.

There are several better venues where you can learn more about the Internet Archive and discuss your opinions and guesses before spreading false information:

It would be more helpful to explain the apparent conflict between this blog post at the Internet Archive and the Wikipedia Outlinks page it links to. While the former may imply that the archived pages can be accessed via the Wayback Machine the later makes clear that that collection is not publicly accessible.

The Outlinks are (I believe) not the archived pages, rather the list of links to be archived. For whatever reason that list is not publicly available, but it doesn't need to be. It is possible to test if Internet Archive is auto archiving .. add a link that has never been archived before to a wiki page and see what happens (looks like 6hr cycle?). If it doesn't work, let me know I can contact somehow who will look into it.

A few months ago, someone noticed archives not being archived and on contacting IA it was determined the process had quit running (or something) and it was restarted. It really needs a monitoring script - once a week or so. This requires adding a fresh link to Wikipedia every week (into non-main space), which means creating a fresh URL. It can probably be done through tools and cron.

Since it hasn't been mentioned, wanted to note that the new https://tools.wmflabs.org/iabot has an option to add archive parameters to a specific page's citations before they have a chance to die. Separately, the IA bot also crawls WP for dead links and adds archive URLs when applicable. The aforementioned interface is good for proactive editors, though.

Cyberpower678 claimed this task.

Boldly closing this. Internet Archive actively crawls all WMF projects looking for new links and IABot handles Wikidata.