Page MenuHomePhabricator

Investigate advanced dead link detection
Closed, ResolvedPublic5 Estimated Story Points

Description

The code for T122659 was a good start, but we can probably improve on it.

This task it to investigate what other kinds of dead links might be detectable besides 4XX and 5XX error codes. Questions to answer:

Event Timeline

kaldari raised the priority of this task from to Medium.
kaldari updated the task description. (Show Details)
kaldari added a project: Community-Tech.
kaldari added a subscriber: kaldari.

There are also URLs that don't redirect to the domain root but show the same content as the domain root default document.

Would be good to talk with Internet Archive (Greg) and Google contacts to see if they have suggestions.

DannyH edited a custom field.

Investigation summary

Also see, https://en.wikipedia.org/wiki/Wikipedia:Link_rot

Similar tools

  1. Dispenser's Checklinks:
  1. LinkChecker:
  1. Some outdated tools:
  1. Dead link Finder tool on tool labs:
  1. IA's link checking:
    • From an email exchange with Greg, he mentioned that this is a problem IA already tackles and they are internally discussing the option to provide this as a service. If this happens, its our best bet at detecting dead links.

Is there a next step for deadlink detection? Are we expecting to write something, or wait for IA's service?

Is there a next step for deadlink detection? Are we expecting to write something, or wait for IA's service?

I don't know how long it will take for IA to launch their service. From my email exchange with Greg, there don't seem to be firm plans regarding this so far, though there seem to be "internal discussions". I think as a next step, you and Ryan should talk about this with the IA folks when you have your next meeting, or over email.

So it sounds like the only action we can take currently (besides talking with AI) is to add detection for redirects to the domain root. Let's create a new card for that.