A common scenario when working with a big table like wmf.webrequest is having to query one hour of data at a time and wrapping that query in a script which runs through all the different values and stitches together the results into a single dataset. The query uses placeholders, e.g.
WHERE year = ${year} AND month = ${month} AND day = ${day} AND hour = ${hour}
and it would be useful to have a helper utility for running the query on a time range (e.g. from 2020-12-31 00:00 to 2021-01-01 23:00) or a date range (2021-10-01 to 2021-10-31)
Something like this:
INPUT: start, end , query with placeholders for year, month, day, hour (optional, may be omitted in cases where query is safe to run on a day of data), **kwargs
- generate range of times from start to end in hour increments
- for each time, extract year month day hour
- fill in placeholders in query with those values and run it (& forwarding kwargs)
- append results (with time info as columns for aggregating later)
OUTPUT: DataFrame of per-hour or per-day results from start to end