Per T125181, add new code to ...
https://github.com/cyberpower678/Cyberbot_II/blob/master/IABot/checkIfDead.php
... to detect when a URL redirects to the domain root, and if so, consider it dead.
Description
Related Objects
- Mentioned In
- T125181: Investigate advanced dead link detection
- Mentioned Here
- T125181: Investigate advanced dead link detection
Event Timeline
Here are two URLs we should successfully detect as dead:
http://www.copart.co.uk/c2/specialSearch.html?_eventId=getLot&execution=e1s2&lotId=10543580
http://forums.lavag.org/Industrial-EtherNet-EtherNet-IP-t9041.html
Note that the first one actually redirects to the domain without the subdomain.
It turns out there really is no way to find out whether we are on a subdomain or not, in PHP. Best we can do it explode the host part of the url on ".". To grab the absolute root, we can grab the last two parts of the host, but this doesn't work for cases like www.copart.co.uk where it just gives me "co.uk" which is not the root url. Similarly, for "en.wikipedia.org", root is "wikipedia.org". There is no obvious way to differentiate between the two redirects.
This leads to quite complicated and pretty unreliable code: https://github.com/Niharika29/Cyberbot_II/commit/be20420911c0dc7990725874c0b030dcc2e1a41c
I'm not sure of a better way to detect for redirects like forums.lava.org above. Ideas?
Updated. https://github.com/Niharika29/Cyberbot_II/commit/4676a0ee1a2f8ee0eeb4d2644104d40aabce8a99
This seems pretty reliable. I had to comment out the test for copart.co.uk URL since that website is unreachable since yesterday. We should find another that redirects to root domain.