Page MenuHomePhabricator

Feature suggestion for when websites serve custom 404 pages
Closed, DeclinedPublic

Description

IABot does a lot of good work, but I've noticed that one thing that stands in the way of even better performance is the fact that many websites have begun serving custom 404 pages when links become dead, but IABot does not recognize these as dead links since page content is loaded, I think that it would be a great service if IABot was modified or updated in some way so that it check if the terms "404" or "Page Not Found" appeared whenever it checked a link to allow a new avenue to catch dead links. Another line or checking to make sure that a page didn't erroneously or irrelevantly contain the terms "404" or "Page Not Found" would to to also check a random page off of the domain, such as "example.com/6ca13d52ca70c883e0f0bb", which would likely return a 404 page and then compare the content the page being checked and the random page. This is probably a suggestion you have already considered, and if that is the case I would be curious to know if there are plans for implementation, and if there are no such plans I would be curious to know why, thanks for your work!

Event Timeline

@Cyberpower678 Hey, I submitted this a little while ago as suggested by on IABot's Talk Page as a feature request, its appears that you have been active since its creation (most recently as I write this on January 21st), but you have yet to take any action in relationship to it so I wanted to ask if you are aware of this request or not? Thanks.

IABot already does limited checking for Soft 404s, but what your suggesting carries to much cost for too little benefit. Sorry.

What would the cause of the signifcant cost be?

What would the cause of the signifcant cost be?

Custom 404s, can come in some many varieties and flavors that effectively tracking them all would require major software changes. We're talking Google Search Bot level of development here, which is backed by a whole armada of devs. I'm just one person here, developing and improving on a rather large piece of software to run on a multitude of wikis without issue. So I have very limited time and must find which features are worthwhile the implementation costs. Not to mention the page comparison needs to then assume it knows what a 404 will look like. Going to a random page can cause a unresponsive server to come back instead in which the actual page will still be mistaken for alive. So we are now doubling query times because two pages are being requested, and it's not even a guarantee that it will actually work. This is the cost I am referring to.

Got it. Thank you for takimg the time to respond.