Page MenuHomePhabricator

Create idealised schema for moving Cirrus logs into HDFS
Closed, ResolvedPublic

Description

We keep having to patch miscellaneous Python scripts to get data out of the Cirrus search logs. These should live in HDFS - with new and interesting fields that let us gather actual data about the users, and in a format we don't have to use regexes to parse.

Write up an idealised schema of what fields we'd have in this mythical Hive table and HDFS store.