Top level task for organizing backend data engineering for supporting the learn to rank pipeline.
High level design:
* Join webrequest data with CirrusSearch request set to create a table containing search click-through's
* Train a DBN on a suitably sized sample of the search click-through's and record relevance labels into a hive table
* Generate elasticsearch queries for search+page id combinations we want to generate feature labels for and push them into Kafka from analytics network
* Kafka consumer in production network reads queries inserted by analytics, runs them against some elasticsearch cluster, and the generated feature scores get pushed back into a different Kafka log
* Feature logs are read back from Kafka and stored in HDFS
* Join together DBN labels with ES feature generation to make a combined dataset and output in a format suitable for training a machine learning model (XGBoost, RankLib, or LightGBM).
* Train models against dataset, generating output suitable for loading into elasticsearch LTR plugin