Goals:
- Identify metrics we should use for offline and online evaluation of autocomplete. Previously when tuning autocomplete for wikidata we have use Mean Reciprocal Rate (MRR) and a metric we invented that estimates the number of characters that need to be typed for each user query.
- Identify potential ranking methods. Previous work on wikidata used Most Popular Completion (MPC) as a baseline to compare against, but there are likely better options. If we use MPC, we need to identify appropriate ways to balance recent popularity vs long term popularity. It we use something else it should be relatively simple. For example personalization, even limited to single session context, is likely more complex than we should initially consider.
- Identify potential methods for deciding the set of queries that can be "released" as query completions. While we have not released query sets before, we expect the general concept to be around identifying repeated queries from multiple users.