On T335860, we implemented a pyspark job that runs a MERGE INTO that transforms event data into a table that will eventually have all the mediawiki revision history.
Since we don't understand completely downstream consumers of that table, we deferred optimizing its schema.
In this task we should think about these downstream consumers and flatten and/or tune the schema and partitioning for their benefit.