We have a cli tool to perform a crawl, but we do not currently have anything setup to actually run that command periodically.
This also implies figuring out what we will use to schedule tasks. Local cron is not a realistic option in a k8s deployed system, but we could do:
- cron on a metal server + web API
- celery + celery beat
- k8s scheduled jobs? (Not sure if prod k8s supports that object type or not)
Celery is probably the "best" thing for parity between dev and prod. That would in theory give us another way to scale too (horizontal worker scaling).