Here are two URLs we should successfully detect as dead:
Note that the first one actually redirects to the domain without the subdomain.
It turns out there really is no way to find out whether we are on a subdomain or not, in PHP. Best we can do it explode the host part of the url on ".". To grab the absolute root, we can grab the last two parts of the host, but this doesn't work for cases like www.copart.co.uk where it just gives me "co.uk" which is not the root url. Similarly, for "en.wikipedia.org", root is "wikipedia.org". There is no obvious way to differentiate between the two redirects.
This leads to quite complicated and pretty unreliable code: https://github.com/Niharika29/Cyberbot_II/commit/be20420911c0dc7990725874c0b030dcc2e1a41c
I'm not sure of a better way to detect for redirects like forums.lava.org above. Ideas?
This seems pretty reliable. I had to comment out the test for copart.co.uk URL since that website is unreachable since yesterday. We should find another that redirects to root domain.