The contents of an aggregate toolinfo input file like https://toolsadmin.wikimedia.org/tools/toolinfo/v1/toolinfo.json may change over time. New tools being added and existing tools changing are already handled by the crawler. A tool being removed however is not currently handled in any special way. Over time this could lead to many 'orphaned' records in the Toolhub database, which will reduce its value as an authoritative source.
Description
Description
Details
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
crawler: Delete toolinfo records missing during crawl | wikimedia/toolhub | main | +46 -9 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | bd808 | T271128 Crawler should notice when a toolinfo record is removed from a source url | |||
Resolved | bd808 | T277810 Implement "soft" delete for toolinfo records |
Event Timeline
Comment Actions
Change 673827 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[wikimedia/toolhub@main] crawler: Delete toolinfo records missing during crawl
Comment Actions
Change 673827 merged by jenkins-bot:
[wikimedia/toolhub@main] crawler: Delete toolinfo records missing during crawl