There are three main libraries available for generating LambdaRANK models. Evaluate them to decide which we should use:
- RankLib
- XGBoost
- LightGBM
Evaluation criteria should include:
- Training efficiency, or how much computational power it takes to train a model
- Relatedly, can the library be deployed to our hadoop cluster, either by using multiple executors to train one model, or training multiple models in parallel for learning ideal hyper parameters.
- Ease of performing hyper parameter optimization, and if the set of available parameters is sufficient
- Ability to transform resulting models into a format suitable for loading into elasticsearch LTR plugin
- Similarity of results (should be exactly the same, assuming no updates to search index) from transformed models insert into elasticsearch to those generated in the test data set
- How easy/hard it is to setup and run these models in our analytics network