Right now we are using tuning parameters for Wikidata search (both prefix and fulltext) which are more or less invented out of the thin air. I wonder if we could use some ML (or other) technology with actual user clicks data to have better tuning of those parameters.
Potential targets:
- Entity weight parameters (both satu params and weights of features on entities). We are only using incoming links and sitelinks counts now - maybe we should use more features?
- Relative weights of various matches - label, alias, description, other language, etc.?
- For fulltext possibly also more advanced features that we're building with Mjolnir?
The start would be to actually build a data pipeline allowing us to know which search result was chosen by the user, especially for prefix search which is used ~1M times a day.
As this is an exploratory task, suggestions about what else could be done here are welcome.