The goal of this task is to run the offline evaluate of the semantic search model developed in T412338: Q2 FY2025-26 Goal: Semantic Search - Embeddings Service for MVP using the benchmark dataset from T406207: Create a dataset for evaluation of search on Wikipedia. For this, we need to get the top-k (k probably 10) search results for each of the 600 queries (see this list). The crucial part is that the model should be trained/indexed with the same fixed corpus that we used for the creation of the benchmark dataset. The snapshot of the corpus is located at /user/trokhymovych/wikimedia_processed_snapshot_20260125.
(stretch goal) Get the top-k search results for other search models.