Page MenuHomePhabricator

Detect dead links (451 HTTP error) as dead instead of alive
Closed, DeclinedPublic

Description

When links are dead due to the web page owners returning 451 errors, this bot considers the page to be alive. Instead it should have a server in the EU so it can detect 451 dead web pages and replace the dead 451 links with worldwide accessible IA links.

Here is a list of websites dead in the EU:
https://data.verifiedjoseph.com/dataset/websites-not-available-eu-gdpr

All references to the above web pages are link rotten in the EU, but the bot doesn't catch it yet. That should be fixed.

Event Timeline

KristofferR renamed this task from Detect 451 dead links as dead instead of alive to Detect dead links (451 HTTP error) as dead instead of alive.Jun 13 2019, 11:01 PM
KristofferR created this task.

FWIW the links are not actually dead only policy blocked. And only for some viewers, based on their IP of origin. Policy blocks are made by administrative decree and can change at any moment. There is also the potential for the blocking sites to block the archive site itself, which would create bigger problems not just for these 1300 domains.

IMO this would be better handled user-side plugin that replaces the URL with the archive version. This is already done using this browser add-on

https://addons.mozilla.org/en-US/firefox/addon/wayback-machine_new/

I don't know how it responds to 451 but it should be easy to modify. It might even work out of the box.

I totally disagree, you can't seriously expect the average Joe in Europe to install extensions to be able to access linked references - by that logic the whole IABot project is also pointless, since the same could be said there. "Dead links aren't an issue, just use an extension".

Permanently inaccessible links and dead links are the same practically, and should be treated as such, regardless if the error code is 404 or 451 (or an error page with 200 code). 404 links can start working again at any moment too, same situation.

The web sites in question kills the links for Europe due to them not bothering to implement the privacy protections in GDPR, mostly due to few resources/laziness. They're covering their own butts the easiest way possible. They won't suddenly start to block the IA, they have no (new) reason to.

Cyberpower678 closed this task as Declined.Jul 26 2019, 9:15 PM

These links aren't dead, they're just blocked. They work perfectly fine outside of the EU and because Wikipedia servers are based in the US, that's where IABot lives. It lives within their servers. If IABot gets any 400 error or up, it will be seen as dead. If users want to get around that, they should use a VPN. A lot of them are free these days.