While working on the MRR evaluation i noticed there are a variety of clickthroughs that look unlikely to be typed by a human, here are a few example autocomplete queries that led to clickthroughs:
- Australian Citizen
- Alumni Oxonienses: the Members of the University of Oxford, 1715-1886
- Czech International Badminton Championships
- Board of Intermediate and Secondary Education
These only look to be a few % of logs, but they break the assumption that all prefixes of the submitted query are useful for us to improve the results of.
A few approaches could be taken:
- Each displayed search result could be logged. Instead of assuming useful prefixes only use prefixes that were actually displayed in the browser
- We could track previously shown prefixes in browser and submit them with the click event
- We could track previously shown x-search-id headers in browser and submit them with the click event
- We could track previously shown prefixes in browser, and use some heuristic to decide if the click is worth logging
- Probably others