Right now, wmfdata.hive.run runs queries in a single way, using a Spark session with the default settings. Based on a meeting with Analytics, it seems it's not possible to recommend an optimal engine for all use cases, so the function should support several (Presto, the Hive CLI, and a couple different bundles of Spark settings).
In my opinion, this should be done by updating wmfdata.hive.run to take an engine parameter, with values like presto, hive-cli, spark, and spark-large, to make it as easy as possible to swap engines. However, I could be convinced that it's better to have separate functions entirely, particularly because Presto may require different functions or syntax in some circumstances.
It's also important to provide a good default for users who don't have the time or expertise to tune; it sounds like that should be the Hive CLI, which provides strong reliability at the cost of some speed and resource use.