Per our talk with @Halfak, I think it would be great to have these data inside hadoop (given that the data is super big and hadoop, by design, should handle these cases) and since we have AQS gives a public API to use the data for external users such as researchers, etc.
This task is done when: we know for sure if this is possible or not and proper phab cards are in place.