We obtained search results for the semantic search using the qwen-3-0.6B model T417242: Get search results for queries from benchmark dataset for semantic search model. In order to increase the generalizability of the results, we would like to evaluate different embedding models in the same setup. The task is thus to get the top-10 search results for the queries from the benchmark dataset using the following models:
- jina-embeddings-v5-text-nano smaller version of the jina but only ~200M params so interesting for performance reasons
- pplx-embed-v1-0.6b alternative to qwen3 that leads to better performance on the MIRACL benchmark
- multilingual-e5-large-instruct alternative to qwen3 models, we used this for the previous prototype in cloud-vps
The additional would be nice to have but are not crucial
- (if technically feasible) Qwen3-8B a larger version of qwen-3; even if we wont be able to host in production, it would be a good comparison for offline eval.
- jina-embeddings-v5-text-small derived from qwen3 yielding to some improvements
- snowflake-arctic-embed-l-v2.0
- gte-multilingual-base