Page MenuHomePhabricator

UI archive services
Closed, ResolvedPublic

Description

The UI only allows entering a couple archive services into the archive URL field such as Wayback and Webcite. There are a few dozen legit archive services, and new one's keep appearing.

The List of domains can be found here:

https://en.wikipedia.org/wiki/Module:Webarchive

Search on: function serviceName

I keep this list up to date as I discover new services. Another good source is

https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives

Though it is missing some like wikiwix

Event Timeline

Restricted Application added subscribers: Cyberpower678, Aklapper. · View Herald Transcript
Cyberpower678 moved this task from Inbox to Archive requests on the InternetArchiveBot board.

I'm not sure about loc.gov. Seems to be a general purpose archive of government related stuff rather than an internet archive.

Yea, I'm not convinced loc.gov is an internet archive the bot can use or understand when looking at snapshots.

LOC is part of the Memento network of archive sites.

http://timetravel.mementoweb.org/about/

They archive complete websites (government and non-government) and are a great resource:

https://www.loc.gov/websites/collections/

Can you give me a snapshot URL. I can't seem to find one on their site.

http://webarchive.loc.gov/all/20120826031024/http%
3A//www%2Enj%2Ecom/news/index%2Essf/2010/06/diane_gooch_concedes_to_anna_l%2Ehtml

In the FAQ #4

https://www.loc.gov/webarchiving/faq.html

"The Library of Congress contracts with the Internet Archive for many of its web archiving projects."

Bibalex.org doesn't look like an archive at all. Do you have a snapshot from them as well?

I think I found it, but I had to do a bit of digging for it.

Know of any API or way to scrap information from http://haw.nsk.hr snapshots. The URLs aren't of any help.

I can't add HAW, as it as no way for me to reliably interpret the URL or the archive snapshot. Even screen scraping is useless.

@GreenCardamom nlib.ee is not the Estonian Web Archive. http://veebiarhiiv.digar.ee is the Estonian Web Archive.

I've actually never seen those in the wild they were listed in the Memento list (linked above) so I included them. The ones I often see besides LOC are

http://arquivo.pt/wayback/20090712010756/http%3A//www...
http://perma-archives.org/warc/20150917181901/http%3A//eur%
https://swap.stanford.edu/20100424001857/http%3A//www..

The UK Web Archive appears to be broken.

Wikiwix is not an archive. I can't find anything resembling an archive search. It only searches through Wikipedia articles.

Added the following services:

  • Archive-It
  • Portuguese Web Archive
  • Library of Congress
  • National Archives and Records Administration
  • Bibliotheca Alexandrina
  • Canadian Government Web Archive
  • Estonian Web Archive
  • National and University Library of Iceland
  • Public Record Office of Northern Ireland
  • Slovenian Web Archive
  • Stanford Web Archive
  • UK Government Web Archive
  • UK Parliament's Web Archive
  • Web Archive Singapore
  • Perma.cc

To request more, please open a new ticket.