The aim of this task is to use reading lists to evaluate (and potentially train) models for recommending related articles that are interesting for readers.
We want to run an experiment to recommend related articles:
- Define experiment setup (ground truth data and evaluation metric)
- Evaluate morelike as a baseline
- Evaluate vector search with topics embeddings
- Evaluate vector search with text embeddings using LLMs
- (stretch) fine-tune (train) text embedding model with data from reading lists
Context: We conducted exploratory analysis on reading lists in T382493. This suggested that articles appearing together in the same reading list can be considered relevant recommendations as related articles since they were specifically curated by readers. This allows us to use reading list as a ground truth dataset to systematically evaluate different models for recommending related articles. The current baseline for generating related articles in Wikipedia is cirrussearch' morelike. Research has developed tooling to generate embeddings for similarity search (aka vector search). This can be used to develop an alternative (and hopefully better) model to generate related articles. Furthermore, we can fine-tune these models using data from existing reading lists to further improve the recommendations.
The experiment around recommending related articles thus provides a testing ground for a specific use case to demonstrate whether and how much vector search approaches can improve on traditional search approaches.