As of October 2024 the model used in production were trained on april 2022 (models suffixed with 20220421-20180215-query_explorer).
It could be interesting to upload a new set of models for the projects where LTR is enabled:
- learn if the search behaviors changed in 2 years
- exercice our ability to do A/B tests
- assess our A/B infrastructure
- assess our ability to run analysis
While some part of this work might be relatively straightforward:
- verify that models are properly exported weekly to the proper elasticsearch clusters
- setup an interleaving A/B test in mw-config
- verify that the A/B test data is flowing in
The analysis part might be more challenging. Can we re-use the automatic report generator or do we have to rebuild it?
AC:
- Setup an A/B test comparing the production models and recent models on the projects where LTR is enabled
- Verify that the interleaving A/B test infrastructure is collecting the data we expect
- Determine how to run an analysis on the A/B test data (possibly create a separate task if we can't re-use the automatic report generator)
- Promote the new models to production if proven better or try to understand why they perform worse if not