Automated archiving of URLs
Closed, DuplicatePublic

Description

Various mechanisms are in place on different wikis to address link rot, e.g. parameters like "archiveurl=" and "archivedate= " in templates like cite_web and cite_news on enwp that allow to link to archived versions of the cited URL, or https://ru.wikipedia.org/wiki/%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0%BA:WebCite_Archiver , which creates and adds archival links automatically.

It would be good if Citoid would provide similar functionality, so as to address link rot, and in a way that is more consistent across wikis.

Daniel_Mietchen added a project: Citoid.
Daniel_Mietchen moved this task to IO Tasks on the Citoid board.
Daniel_Mietchen added a subscriber: Daniel_Mietchen.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 13 2015, 9:40 AM
Mvolz triaged this task as "Normal" priority.Feb 16 2015, 3:30 PM
Mvolz set Security to None.

I like archive.today better, in terms of predictability of URLs (might be faster since we don't have to wait for the URL to be generated to link to it- archive.org uses the timestamp to a too precise degree to be predictable, probably) and also that it ignores robots.txt (as it is allowed to do because the archiving is being done as a the result of a direct request from a user, and not a crawler,) but on the downside it's younger and privately funded. Archive.org is more established and a non-profit, and probably more reliable.

Thoughts? @mobrovac? @Jdforrester-WMF, is this something we should consult legal for?

Eloquence added a subscriber: Yana.Mar 12 2015, 7:52 AM

I recommend checking out https://perma.cc/ as well which is specifically designed for this purpose and backed by DPLA, Internet Archive and others. @Yana

Elitre added a subscriber: Elitre.Mar 15 2015, 12:41 PM
Yana added a comment.Mar 21 2015, 5:49 PM

If we use something like perma.cc or archive.org for citations on wikis, we probably need some sort of clear warning that the actual site is no longer available so that contributors can verify that the site was not removed precisely because the statement it is cited for on Wikipedia is no longer true.

He7d3r added a subscriber: He7d3r.Mar 30 2015, 1:01 AM
Mvolz added a comment.Apr 7 2015, 3:05 PM

@Yana, I'm not sure exactly what you're requesting? We would leave the original url in, and this would allow users to verify the current existence or non-existence of the original url themselves. Doing this would not change the user's experience of the site in terms of being able to verify why a link is no longer available; we are just doing something automatically that is typically done manually or by a bot, see: https://en.wikipedia.org/wiki/Wikipedia:Link_rot

Note that archive.org has reached out to us before about this and would be happy to be an active partner in this. I'd be happy to set up the meeting there.

Qgil added a subscriber: Qgil.EditedJun 5 2015, 9:18 AM

Note to everyone that I am 70% done in development of an archive bot for enwiki which makes use of archive.org aka the wayback machine. Current features of the bot include, testing to see if the link is dead, requesting wayback to archive pages that have no archived copy yet, and of course linking a source to an archive when the live link goes dead. A BRFA is currently open.

Mvolz added a subscriber: Ocaasi.Jun 17 2015, 3:50 PM

@Cyberpower678 awesome! CC-ing @Ocaasi who will be happy to hear that!

Note to everyone that I am 70% done in development of an archive bot for enwiki which makes use of archive.org aka the wayback machine. Current features of the bot include, testing to see if the link is dead, requesting wayback to archive pages that have no archived copy yet, and of course linking a source to an archive when the live link goes dead. A BRFA is currently open.

Sadads added a subscriber: Sadads.Oct 13 2015, 8:32 PM
Jay8g added a subscriber: Jay8g.Dec 9 2015, 6:31 AM
Restricted Application added a project: VisualEditor. · View Herald TranscriptOct 28 2016, 3:34 PM