Page MenuHomePhabricator

Toolinfo.json crawler
Closed, ResolvedPublic

Description

  • Crawler process that loads toolinfo.json source URLs, validates their content, and populates the Toolhub database
  • API endpoint(s) to view crawler status information (run date, errors from run)
    • GET /api/v1/crawler/runs - paginated, filterable list of known runs
    • GET /api/v1/crawler/runs/{id} - run details

Event Timeline

bd808 triaged this task as Medium priority.Oct 6 2020, 10:33 PM

Change 639666 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[wikimedia/toolhub@main] Toolinfo storage model and basic API

https://gerrit.wikimedia.org/r/639666

bd808 moved this task from Backlog to In Progress on the Toolhub board.
bd808 moved this task from To Do to In Dev/Progress on the User-bd808 board.

Change 639666 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[wikimedia/toolhub@main] Toolinfo storage model and basic API

https://gerrit.wikimedia.org/r/639666

This patch is really T264811: Toolinfo read API, but also necessary for this task.

Change 641559 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[wikimedia/toolhub@main] Add a basic crawler

https://gerrit.wikimedia.org/r/641559

Change 641559 merged by jenkins-bot:
[wikimedia/toolhub@main] Add a basic crawler

https://gerrit.wikimedia.org/r/641559

Change 642131 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[wikimedia/toolhub@main] api: Endpoints for crawler run status data

https://gerrit.wikimedia.org/r/642131

Change 642131 merged by jenkins-bot:
[wikimedia/toolhub@main] api: Endpoints for crawler run status data

https://gerrit.wikimedia.org/r/642131

Change 650639 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[wikimedia/toolhub@main] crawler: Move create/update logic to custom manager

https://gerrit.wikimedia.org/r/650639

Change 650639 merged by jenkins-bot:
[wikimedia/toolhub@main] crawler: Move create/update logic to custom manager

https://gerrit.wikimedia.org/r/650639

We will need some follow up work in the future to schedule crawler runs, but the core work is done.