Is it feasible to detect robots.txt exclusion on archive.org to detect additional dead links?
https://nl.wikipedia.org/w/index.php?title=CouchSurfing&diff=prev&oldid=49481734
https://web.archive.org/web/20120206182825/http://wiki.couchsurfing.org/en/Main_Page
Description
Description
Event Timeline
Comment Actions
IABot uses filters when requesting archive copies. Any instances of defective archives must have happened retroactively.
Comment Actions
GreenC bot detects robots.txt and will try to find a different archive if available. If none available it will keep the robots.txt snapshot, because Wayback management said they plan to remove that policy block sometime in the near future, hopefully.