Page MenuHomePhabricator

IABot API wishlist
Closed, ResolvedPublic

Description

Wishlist for new features in the IABot API

  • Retrieve url records via urlid. In action=searchurldata a new parameter for urlid
  • New database fields to know if a url was reviewed, who reviewed it (Manual or Bot) and date of last review. This will allow Wayback Medic to use the IABot database as its authority in the future and greatly streamline and speed it up. It will also allow IMP to track verification so it can re-process as new archives are added, greatly speeding it up to avoid duplicate verification.
  • hasarchive option to retrieve links marked dead (and dying?), and without an available archive.
  • Method to retrieve all urlid's used in a given article name

Event Timeline

I reject everything on your wishlist, because I choose to be mean. :p

Cyberpower678 moved this task from Inbox to v1.4 on the InternetArchiveBot board.
Cyberpower678 moved this task from Unsorted to Bugs on the InternetArchiveBot (v1.4) board.
Cyberpower678 moved this task from Bugs to New feature on the InternetArchiveBot (v1.4) board.

Oh no! Well the more I think about #2 it might make more sense to track it local but still thinking about it. #4 is just an idea with no immediate application/need but could open possibilities. #1 a reverse lookup needed for debugging (needed it today for example). #3 will be key to saving links.

Number 2 has a reviewed field in the column, but it's a hidden field. It doesn't identify who did it and when, but it's only set when changed via the API or the interface. However, this can be tied into the logs that have the URL ID associated with it. Everything is heavily logged after all.

Great thank you #1. For #2 the thinking is it needs to be able to track a link has been verified, most of the time it makes no change to the database because it verified OK. Being able to track this is important so it can go back and re-process the database to catch any new archive URLs that were added without re-processing the same URLs it already verified which is very time consuming. Unless they passed an expiration date and need to be re-verified thus the need for a date of verification. Maybe can track it locally not sure, what do you think. I thought if it was tracked in the IABot database anyone could then run IMP.

I've been playing around with number 2 for a bit, but I think I got a good solution. I'm tying the results into the logs, to get you your information. You can access the hidden reviewed parameter by setting the filter reviewed to either 0 or 1. This is will be in the beta3 release.

Number 3 is something I've been struggling with for a while. There are serious performance issues adding in this particular filter.

I think Number 3 is finally ready to go. I'll do some more benchmarking tomorrow, but it seems to be stable.

Number 4 is now ready to go. These will be available in the beta3 release.

API is live, new documentation will be up by the end of tomorrow.