There is a wide range of use cases which look like this:
* Run a slow query on the wiki's database periodically
* Store the results somewhere
* Reuse the results in the wiki's user interface
(Some existing examples are: the [[https://www.mediawiki.org/wiki/Growth/Positive_reinforcement#Impact|user impact module]] in GrowthExperiments; [[https://www.mediawiki.org/wiki/Manual:UpdateSpecialPages.php|query pages]]. But I expect there would be a lot more if this kind of thing would be easier to do.)
Currently the Wikimedia infrastructure doesn't make this kind of thing easy. Normally it's done with a [[https://en.wikipedia.org/wiki/Data_lake|data lake]]. We [[https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake|have a data lake]]; but it has several limitations:
* Wiki DB data is imported once a month; for most things that are displayed on a wiki interface, you'd want, at a minimum, daily updates.
* Scheduling queries requires writing nontrivial code in an environment that's unfamiliar to most MediaWiki developers (Hadoop, Spark, Airflow etc).
* A new service needs to be set up for every report, or a new API endpoint has to be fit into some existing service.
Ideally, we'd instead have a system where
* Data from the wiki DB is synced to some data lake daily or more frequently (maybe even in real time).
* There is an easy, declarative way to provide an SQL query an some information on when it should be run, how long the results should be kept etc.
* Similarly, there is an easy, declarative way to expose the results of the query via some API that's shared by all such reports.