After establishing that we definitely should not just turn on the general query log in mariadb (because of the infinitely climbing disk space that would consume), we decided we need a random sampling of queries to get a good idea of the usage patterns.
We'll need to create a service to log queries here and there, ideally only the queries we want (user queries vs. system stuff) and apply some metadata to them. Something that turns the general log on would need to run locally over the instances and then move the bulk logs somewhere, which seems very inefficient.
Using any user with PROCESS privs (possibly a new one since right now that's root and wmf-pt-kill), it would be possible to schedule regular queries of SHOW FULL PROCESSLIST; in order to get the full text of all queries running at that moment and then parse it from there.
It really should run across all three primary replicas on any given run, preserve the date, time and host information, as well as what type of user is running things (tool or regular user). We could also tally unique users on each run.
If we let that fill a database table, it can be analyzed later in a variety of ways. Really, this should be a VM with a large enough disk to house the amount of data we'd like to have.