Page MenuHomePhabricator

Tune wikidata fulltext search similarity parameters
Closed, ResolvedPublic

Description

Wikidata uses array fields, it's likely that popular items gets more aliases, the all field is affected as well.
We know that array fields may cause troubles with length normalization causing popular items to have low score.
The plan would be to tune the BM25 b param for:

  • all
  • labels.*

and proceed as follow:
1/ reindex relforge to setup different similarities for these field
2/ tune the b param (once the similarity is set we can close the index to tune it and iterate like that)
3/ submit a patch to wmf-config to add a wikidata profile in wgCirrusSearchSimilarityProfiles

Details

Related Gerrit Patches:
mediawiki/extensions/Wikibase : wmf/1.31.0-wmf.12Extract names of search fields as constants
operations/mediawiki-config : master[cirrus] tune wikidata similarity configuration
mediawiki/extensions/Wikibase : masterExtract names of search fields as constants

Related Objects

StatusAssignedTask
ResolvedWikidata-bugs
DeclinedNone
OpenNone
Resolvedaude
ResolvedSmalyshev
Resolvedaude
ResolvedNone
InvalidNone
ResolvedSmalyshev
ResolvedSmalyshev
ResolvedLydia_Pintscher
DuplicateSmalyshev
ResolvedSmalyshev
ResolvedSmalyshev
Resolveddcausse
DuplicateNone
DeclinedNone
DeclinedNone
Resolveddaniel
ResolvedLydia_Pintscher
OpenNone
DeclinedNone
ResolvedSmalyshev
Resolveddcausse
Resolveddcausse
ResolvedSmalyshev

Event Timeline

dcausse triaged this task as Normal priority.Dec 7 2017, 9:03 AM
dcausse created this task.

Change 397852 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/Wikibase@master] Extract names of search fields as constants

https://gerrit.wikimedia.org/r/397852

Change 397855 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] [cirrus] tune wikidata similarity configuration

https://gerrit.wikimedia.org/r/397855

Change 397852 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Extract names of search fields as constants

https://gerrit.wikimedia.org/r/397852

Change 398018 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.12] Extract names of search fields as constants

https://gerrit.wikimedia.org/r/398018

Change 397855 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] tune wikidata similarity configuration

https://gerrit.wikimedia.org/r/397855

Mentioned in SAL (#wikimedia-operations) [2017-12-13T14:20:12Z] <dcausse@tin> Synchronized wmf-config/Wikibase.php: T182293 [cirrus] tune wikidata similarity configuration 1/2 (duration: 01m 12s)

Change 398018 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.31.0-wmf.12] Extract names of search fields as constants

https://gerrit.wikimedia.org/r/398018

Mentioned in SAL (#wikimedia-operations) [2017-12-13T14:22:08Z] <dcausse@tin> Synchronized wmf-config/InitialiseSettings.php: T182293 [cirrus] tune wikidata similarity configuration 2/2 (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2017-12-13T14:29:58Z] <dcausse@tin> Synchronized php-1.31.0-wmf.12/extensions/Wikibase: T182293 Extract names of search fields as constants (duration: 02m 05s)

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Dec 18 2017, 2:57 PM
debt closed this task as Resolved.Jan 8 2018, 2:31 PM