In order to develop models for improving search, we need a dataset of queries with annotations of relevant results. Since we are currently lacking such a dataset. the goal of this task is to collect such a dataset. For the first version, we will restrict ourselves to English (potential follow-up work could be expansion to other languages, e.g., via translation).
The details of this task still need to be determined.
Things to figure out
[ ] What types of queries (T407603)
[ ] What level of annotation (article, section, paragraph, sentence, etc)
[ ] Dataset size
[ ] How to collect the dataset (queries/annotations/corpus)
Additional information:
* An example of how such a dataset could like is [[ https://ai.google.com/research/NaturalQuestions | Google's natural questions dataset ]] which contains natural language queries from aggregated and anonymized issues by users to the Google search engine. These queries are then annotated with relevant paragraphs from Wikipedia articles containing the answer.