Page MenuHomePhabricator

Setup periodic crawling
Closed, ResolvedPublic

Description

We have a cli tool to perform a crawl, but we do not currently have anything setup to actually run that command periodically.

This also implies figuring out what we will use to schedule tasks. Local cron is not a realistic option in a k8s deployed system, but we could do:

  • cron on a metal server + web API
  • celery + celery beat
  • k8s scheduled jobs? (Not sure if prod k8s supports that object type or not)

Celery is probably the "best" thing for parity between dev and prod. That would in theory give us another way to scale too (horizontal worker scaling).

Details

Event Timeline

Celery is probably the "best" thing for parity between dev and prod. That would in theory give us another way to scale too (horizontal worker scaling).

The more I think about this, the more it feels like a Kubernetes CronJob would actually be the simplest thing to use initially if that is supported in the prod k8s cluster.

Change 710704 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/deployment-charts@master] toolhub: Add CronJob for crawer

https://gerrit.wikimedia.org/r/710704

Change 710704 merged by jenkins-bot:

[operations/deployment-charts@master] toolhub: Add CronJob for crawler

https://gerrit.wikimedia.org/r/710704