Page MenuHomePhabricator

Tune wikidata fulltext search similarity parameters
Closed, ResolvedPublic

Description

Wikidata uses array fields, it's likely that popular items gets more aliases, the all field is affected as well.
We know that array fields may cause troubles with length normalization causing popular items to have low score.
The plan would be to tune the BM25 b param for:

  • all
  • labels.*

and proceed as follow:
1/ reindex relforge to setup different similarities for these field
2/ tune the b param (once the similarity is set we can close the index to tune it and iterate like that)
3/ submit a patch to wmf-config to add a wikidata profile in wgCirrusSearchSimilarityProfiles

Related Objects

StatusSubtypeAssignedTask
ResolvedWikidata-bugs
DeclinedNone
OpenNone
Resolvedaude
ResolvedSmalyshev
Resolvedaude
ResolvedNone
InvalidNone
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedLydia_Pintscher
DuplicateSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
Resolveddcausse
DuplicateNone
DeclinedNone
DeclinedNone
Resolveddaniel
ResolvedLydia_Pintscher
OpenNone
DeclinedNone
ResolvedSmalyshev
Resolveddcausse
Resolveddcausse
ResolvedSmalyshev

Event Timeline

dcausse triaged this task as Medium priority.Dec 7 2017, 9:03 AM
dcausse created this task.

Change 397852 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/Wikibase@master] Extract names of search fields as constants

https://gerrit.wikimedia.org/r/397852

Change 397855 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] [cirrus] tune wikidata similarity configuration

https://gerrit.wikimedia.org/r/397855

Change 397852 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Extract names of search fields as constants

https://gerrit.wikimedia.org/r/397852

Change 398018 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.12] Extract names of search fields as constants

https://gerrit.wikimedia.org/r/398018

Change 397855 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] tune wikidata similarity configuration

https://gerrit.wikimedia.org/r/397855

Mentioned in SAL (#wikimedia-operations) [2017-12-13T14:20:12Z] <dcausse@tin> Synchronized wmf-config/Wikibase.php: T182293 [cirrus] tune wikidata similarity configuration 1/2 (duration: 01m 12s)

Change 398018 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.12] Extract names of search fields as constants

https://gerrit.wikimedia.org/r/398018

Mentioned in SAL (#wikimedia-operations) [2017-12-13T14:22:08Z] <dcausse@tin> Synchronized wmf-config/InitialiseSettings.php: T182293 [cirrus] tune wikidata similarity configuration 2/2 (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2017-12-13T14:29:58Z] <dcausse@tin> Synchronized php-1.31.0-wmf.12/extensions/Wikibase: T182293 Extract names of search fields as constants (duration: 02m 05s)