The goal of this task is to collect a set of queries for the benchmark dataset.
These should be representative of the different query types we identified in T407603
These queries will later be used for annotating.
Potential resources:
- Wikipedia query logs
- Queries from surveys or user studies, e.g., the queries mentioned/observed in readers foundational research: (see Appendix of the prototype evaluation for semantic search)
- Search’ “golden set” of queries with human-graded results from the Discernatron project
- Public datasets: MS Marco (Bing queries) or Natural Questions (Google queries, though this has very restrictive filters for longer natural language questions)