We should create a centralized logging interface (probably on Tool Labs) that keeps track of which pages have had their dead links fixed, when, and by what agent/bot. This will facilitate 3 things:
- If a bot dies, it can pick up where it left off
- It will help prevent bots from doing redundant work
- It will provide a centralized (and hopefully comprehensive) reporting interface for the Internet Archive and other archive providers
This tool should provide 2 APIs and a web interface:
- The first API is for recording fix attempts. It should include the following information: wiki, page name, possibly page id, timestamp, possibly revision id, number of links fixed, agent/bot, archive service used
- The second API (optional) should return the last page (and its wiki) processed by a given agent/bot and the timestamp it was processed at. It's input should be the agent/bot name.
- The web interface should show a chart of total number of dead link pages fixed or processed per day (on a given wiki). You should also be able to filter by agent.
- The web interface could optionally provide a paginated log of all dead link pages that have been fixed/processed.
Each of these will need to broken out into separate tasks.