Increasing the frequency of the Internet Archive's IRC bot (T199193) is not as helpful if it's still unable to archive a lot of pages. It would be really, really nice if the process of archiving links (regardless of the entity running it) could have fallback methods for archive attempts that don't work. (I know the Internet Archive does have a YouTube collection, but it's fairly difficult to actually get content into it, it seems to be only for copyleft/public domain videos, and I don't think the bot is capable of adding to it.)
w:en:Help:Archiving a source lists five web archiving services, including the Internet Archive. All five of them have drawbacks which prevent them from being used to archive everything.
- Save Page Now of IA's Wayback Machine is great, but has the aforementioned drawbacks which limit it to a minor but significant degree (YouTube and Facebook are two of the four websites more popular than Wikipedia, but neither can be archived through Save Page Now).
- Archive.is has a limit of about 30 pages per minute, and actively tries to prevent mass archival. I know this because they've probably blocked my ISP because I tried to archive too many pages (don't worry about it; it only happened after tens of thousands of pages). However, it has several fairly good fallback mechanisms to archive pages, including the Google cache and some other websites. It's also permanently logged in to a Facebook account, and is the only one of the five to have a .onion address on the Tor network.
- Webcitation.org is fairly bad at archiving CSS, in my experience, and also fails on some Hong Kong government websites. It doesn't really seem to have any benefits over web.archive.org.
- Perma.cc is only used in 62 English Wikipedia articles as of this writing. I don't know much about it but it has a limit of 10 pages per month per account. Perhaps @Green_Cardamom might know more.
- Webrecorder is only used in 16 articles as of this writing. It's a nonprofit which was created this year, and as far as I'm aware is the only archiving service of the five which can save YouTube videos* (try loading the comments – you'll get to exactly where I stopped scrolling), OpenStreetMap's "slippy map", and other interactive/dynamic content. It also has virtual machines, presumably for viewing outdated HTML. However, it would be more difficult to fix dead links with it because all archive links contain the name of the account which was used to create them (i.e. URLs can't automatically redirect like they do at web.archive.org). It would also probably be fairly difficult to crawl sites without support from the site owners.
Having a ninety-something percent archival success rate is still not really ideal if it's out of millions of links per year, many of which will be gone within a few months of their addition to articles. The main reason I'm posting this here is that there is currently a Village pump (idea lab) thread on the English Wikipedia proposing that the WMF runs their own archiving service. While I personally think this is somewhat unnecessary and would be an inappropriate use of WMF resources, especially when there are other non-profits which are already dedicated to web archiving, the current options are not satisfactory for archiving everything that should be archived.
(* – the YouTube archival doesn't totally work because YouTube's links generate some sort of error and not all of the video quality settings can be chosen)
|avoiding paywalls and logins||no||sometimes||yes?||no||no|
|API of some sort||yes||not really||no||not really||no|