Add ability for InternetArchiveBot to skip paywalled links
Closed, ResolvedPublic8 Story Points

Description

Every time InternetArchiveBot encounters a link that is marked as pay-walled, it should add that domain into a dedicated table. It should then check each link that it scans to see if it's domain is listed in the paywall table, and if so skip it.

kaldari created this task.Apr 18 2016, 5:43 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 18 2016, 5:43 PM
kaldari moved this task from Untriaged to Backlog on the Community-Tech board.Apr 18 2016, 5:45 PM
kaldari triaged this task as "Normal" priority.
DannyH set the point value for this task to 8.Apr 18 2016, 5:47 PM
Niharika edited the task description. (Show Details)Apr 19 2016, 1:06 PM

Just as a status update, as part of the DB overhaul that was done, paywall support was added quite elegantly to the DB. Just needs to be added to the code.

@Cyberpower678: What's the current status of this? Does it still need code written for adding the URLs to the database and skipping them in checks?

@Cyberpower678: What's the current status of this? Does it still need code written for adding the URLs to the database and skipping them in checks?

I'm getting ready to test it.

Discovered a regex bug that seems to carry with all regexes. I'm trying to work on a fix for this.

Cyberpower678 closed this task as "Resolved".Jun 1 2016, 7:44 PM
kaldari moved this task from Ready to Done on the Community-Tech-Sprint board.
DannyH moved this task from Backlog to Archive on the Community-Tech board.Jun 6 2016, 11:57 PM