Investigate advanced dead link detection
Closed, ResolvedPublic5 Story Points


The code for T122659 was a good start, but we can probably improve on it.

This task it to investigate what other kinds of dead links might be detectable besides 4XX and 5XX error codes. Questions to answer:

kaldari created this task.Jan 29 2016, 2:06 AM
kaldari added a project: Community-Tech.
kaldari added a subscriber: kaldari.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 29 2016, 2:06 AM
kaldari set Security to None.Jan 29 2016, 2:07 AM
kaldari moved this task from Untriaged to Backlog on the Community-Tech board.

There are also URLs that don't redirect to the domain root but show the same content as the domain root default document.

kaldari added a comment.EditedJan 29 2016, 6:29 PM

Would be good to talk with Internet Archive (Greg) and Google contacts to see if they have suggestions.

DannyH edited a custom field.Jan 29 2016, 6:30 PM
DannyH edited a custom field.
kaldari edited the task description. (Show Details)Feb 1 2016, 6:18 PM
Niharika moved this task from Ready to In Development on the Community-Tech-Sprint board.
Niharika added a comment.EditedFeb 17 2016, 1:05 PM

Investigation summary

Also see,

Similar tools

  1. Dispenser's Checklinks:
  1. LinkChecker:
  1. Some outdated tools:
  1. Dead link Finder tool on tool labs:
  1. IA's link checking:
    • From an email exchange with Greg, he mentioned that this is a problem IA already tackles and they are internally discussing the option to provide this as a service. If this happens, its our best bet at detecting dead links.
DannyH added a subscriber: DannyH.Feb 18 2016, 12:27 AM

Is there a next step for deadlink detection? Are we expecting to write something, or wait for IA's service?

Is there a next step for deadlink detection? Are we expecting to write something, or wait for IA's service?

I don't know how long it will take for IA to launch their service. From my email exchange with Greg, there don't seem to be firm plans regarding this so far, though there seem to be "internal discussions". I think as a next step, you and Ryan should talk about this with the IA folks when you have your next meeting, or over email.

So it sounds like the only action we can take currently (besides talking with AI) is to add detection for redirects to the domain root. Let's create a new card for that.

kaldari closed this task as "Resolved".

Created T127749 for detection for redirects to the domain root.

DannyH moved this task from Backlog to Archive on the Community-Tech board.