Page MenuHomePhabricator

Detect unsuitable archive snapshots
Open, LowPublic

Description

While it is not (yet) possible for the IABot to check whether an archive snapshot is a suitable source, there are cases where IABot adds a link to a snapshot which is broken in a way which should be reasonably easy to detect.

This task collects these cases and gives an overview.

Event Timeline

Cirdan created this task.Apr 26 2018, 12:09 PM
Restricted Application assigned this task to Cyberpower678. · View Herald TranscriptApr 26 2018, 12:09 PM
Cirdan moved this task from Inbox to New feature on the InternetArchiveBot board.Apr 26 2018, 12:11 PM
Cirdan removed Cyberpower678 as the assignee of this task.Apr 26 2018, 5:50 PM
Cirdan updated the task description. (Show Details)
Cirdan updated the task description. (Show Details)
Cirdan updated the task description. (Show Details)
Cirdan added a subscriber: Cyberpower678.
Cirdan updated the task description. (Show Details)May 13 2018, 6:41 PM

I believe that these issues can also be solved by running a third-party service which has the necessary access through IABot's API. At least for T191276 and T194604, this should be straightforward and I'm happy to try creating a small script once 2.0 is released.

Cirdan triaged this task as Low priority.May 17 2018, 2:28 PM
Vvjjkkii renamed this task from Detect unsuitable archive snapshots to t5daaaaaaa.Jul 1 2018, 1:13 AM
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Green_Cardamom renamed this task from t5daaaaaaa to Detect unsuitable archive snapshots.Jul 1 2018, 4:30 AM
Green_Cardamom lowered the priority of this task from High to Low.
Green_Cardamom updated the task description. (Show Details)
Aklapper added a subscriber: Aklapper.

(Removing Tracking-Neverending tag as per its description. Maybe you meant Epic or something similar.)

My bot WaybackMedic does check for these (robots.txt, excluded from the archive, many other cases) and is able to replace them with new archives at a different provider.