Page MenuHomePhabricator

Evaluate adding all/some textual properties to the text field
Open, MediumPublic

Description

As a user of Wikidata search, I want recall to be improved so that I can find what I'm looking for.

In T163642 we made all strings of indexed statements part of the all field allowing them to be searchable by plain search queries.
Unfortunately only a subset of the statements are being indexed. Reason is that indexing a statement today means that we populate the statement_keyword field. This is something we do not want to do for long text, textual content (phrases and long text that need tokenization) is not suited for keyword matching.

If we want to increase recall on wikidata using textual properties we need to come up with a new solution to populate extra text content to existing CirrusSearch field.
Currently the text fields are:

  • text: populated using \Wikibase\EntityContent::getTextForSearchIndex
  • auxiliary_text: not used by EntityHandler

We should evaluate the impact on the size of the index to know if we can feed all the textual properties or only a subset.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
dcausse added a project: Regression.
dcausse renamed this task from \Wikibase\EntityContent::getTextForSearchIndex no longer includes textual properties to Evaluate adding all/more textual properties to the text field.Dec 10 2019, 12:53 PM
dcausse lowered the priority of this task from High to Medium.
dcausse removed a project: Regression.
dcausse updated the task description. (Show Details)
dcausse renamed this task from Evaluate adding all/more textual properties to the text field to Evaluate adding all/some textual properties to the text field.Dec 10 2019, 12:56 PM
dcausse updated the task description. (Show Details)

Forwarding a suggestion made on https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search:

It would be interesting to be able to search for street address (P6375)-values, e.g. Special:Search/Getreidegasse Salzburg should find Q37970995. --- Jura 19:11, 28 December 2021 (UTC)