I was talking with @Ladsgroup and we see a potential collaboration. MediaWiki wastes a lot of resources* computing the results of queries that take a few minutes on the cluster. One class of such queries powers Special pages called QueryPages. Here are two examples:
Most Linked Pages: the query and the result on English Wikipedia.
Wanted Pages: the query and the result on English.
* These queries run once a month or sometimes more frequently. They often take a full day to compute. While they run they execute full table scans and are very disruptive to anything else trying to access the database.
A data pipeline that helps offload this work could be:
sqoop monthly -> wmf_raw -> wmf.<<new table>> -> load into Cassandra -> AQS
And the php pages above could be changed to instead of running expensive sql, query AQS and get the data directly. Or we could just skip the pages altogether and load the data into the QueryCache table.