As a user of Wikidata search, I want recall to be improved so that I can find what I'm looking for.
In T163642 we made all strings of indexed statements part of the all field allowing them to be searchable by plain search queries.
Unfortunately only a subset of the statements are being indexed. Reason is that indexing a statement today means that we populate the statement_keyword field. This is something we do not want to do for long text, textual content (phrases and long text that need tokenization) is not suited for keyword matching.
If we want to increase recall on wikidata using textual properties we need to come up with a new solution to populate extra text content to existing CirrusSearch field.
Currently the text fields are:
- text: populated using \Wikibase\EntityContent::getTextForSearchIndex
- auxiliary_text: not used by EntityHandler
We should evaluate the impact on the size of the index to know if we can feed all the textual properties or only a subset.