Page MenuHomePhabricator

Recurring queries
Open, MediumPublic

Description

There are several queries where the user will be interested in current results, rather than published results.
Example: Latest newusers in the recent week.
http://quarry.wmflabs.org/query/3933

Currently it looks like only the owner can re run the query. It will be great, if queries can be run on access of the URL or by any user.

Event Timeline

Arjunaraoc raised the priority of this task from to Needs Triage.
Arjunaraoc updated the task description. (Show Details)
Arjunaraoc added a project: Quarry.
Arjunaraoc subscribed.
Capt_Swing renamed this task from Dynamic results from quarry to Recurring queries.Jul 17 2015, 7:06 PM
Capt_Swing triaged this task as Medium priority.
Capt_Swing set Security to None.
Capt_Swing subscribed.

Have recurring queries: to start, specify that a query can be run weekly or monthly, etc.

An extra table would be required - schedules. It'll have: (id, query, schedule), where schedule is one of 'daily, weekly, monthly'. And then we'll have celerybeat or something wake up every minute or so, check the schedule, find their latest runs, and if they needed to be run, schedule them. This hopefully distributes it enough to not cause any crashes. However, we can also keep an active count of the queue size and number of executing queries, and just not schedule anything if it is high enough.

An alternative is to use crontab like mechanics, but have those be randomly generated by the code when it is selected. This will distribute them randomly, and also give people an accurate estimate of when this gets updated.

The tsreports approach to this is regenerating on access, but showing the cached version while the regeneration is in process (plus an ETA for the new version).

It can be great to have dedicated runners for scheduled queries, rather than interfere with end-user performance.

Ca be incorporated into the data that will be shown by T206482: Show query code revisions and runs history

I think there is no need to run such a query by cron or similar.

There will sure be forgotten queries which you execute over and over again, but no one is looking at the results. So you waste CPU-time.

With some cron-like mechanism just mark the query as "shall be reexecuted when accessed". And execute the query when someone opens the page and that flag is set, otherwise behave as you do now.