We keep having to patch miscellaneous Python scripts to get data out of the Cirrus search logs. These should live in HDFS - with new and interesting fields that let us gather actual data about the users, and in a format we don't have to use regexes to parse.
Write up an idealised schema of what fields we'd have in this mythical Hive table and HDFS store.