Once we implemented the new fulltext query and the weighted sum we should be ready to do a first offline evaluation.
We could run 2 evaluations with
- PaulScore we used in the past which unfortunately only showed interesting results in offline testing that were not confirmed by A/B testing
- Discernatron data
Depending on the results we could run an optimization plan to fine tune the various settings.
- implement a small tool that loads discernatron scores into relforge.
- index enwiki on relforge servers with production settings (classic similarity, #shards)
- index a second enwiki index on relforge servers with BM25 and production settings (#shards)