Page MenuHomePhabricator

Crawler should notice when a toolinfo record is removed from a source url
Closed, ResolvedPublic

Description

The contents of an aggregate toolinfo input file like https://toolsadmin.wikimedia.org/tools/toolinfo/v1/toolinfo.json may change over time. New tools being added and existing tools changing are already handled by the crawler. A tool being removed however is not currently handled in any special way. Over time this could lead to many 'orphaned' records in the Toolhub database, which will reduce its value as an authoritative source.

Event Timeline

Change 673827 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[wikimedia/toolhub@main] crawler: Delete toolinfo records missing during crawl

https://gerrit.wikimedia.org/r/673827

bd808 moved this task from Backlog to Review on the Toolhub board.

Change 673827 merged by jenkins-bot:
[wikimedia/toolhub@main] crawler: Delete toolinfo records missing during crawl

https://gerrit.wikimedia.org/r/673827