Page MenuHomePhabricator

Adjust rescoring config for Wikidata to consider sitelink count
Closed, ResolvedPublic

Description

It is now possible to configure rescoring on a per-wiki basis. Once we resolve T119066, then we will have a new field(s) that we can have CirrusSearch consider when rescoring and ranking search results for Wikidata.

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolvedaude
ResolvedSmalyshev
Resolvedaude
ResolvedNone
DuplicateSmalyshev
ResolvedSmalyshev
InvalidNone
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
Resolveddcausse
Resolveddcausse
ResolvedSmalyshev
Resolveddebt
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
Resolveddcausse
ResolvedSmalyshev

Event Timeline

aude raised the priority of this task from to High.
aude updated the task description. (Show Details)
aude added a project: Wikidata.
aude subscribed.

we definitely need to also include labels as part of the scoring.

otherwise I get strange results like Q4 on Wikidata would rank higher than Q3 when searching for "life", just because Q4 mentions "life" in the description. (and Q4 has more sitelinks, e.g. on Wikiquote)

so, I'd like to somehow like to treat labels in the search (and maybe fallback languages) like titles are treated now in Cirrus, and probably labels in other languages and aliases with secondary weighting. and descriptions just part of the all (general text) field.

Now the ElasticSearch configs account for sitelinks (and in general any field can be used in search profile with various functions and weights). Do we still need to do anything for this one? Is this for full-test search (which does not feature ElasticSearch yet)?

Smalyshev claimed this task.