Page MenuHomePhabricator

Relax the default AND of the retrieval query filter with minimum_should_match
Closed, ResolvedPublic

Description

It seems to be a low hanging fruit to see if relaxing the default can help to pull new interesting results.
This certainly won't be a final solution but running a quick A/B test could help us to decide if we need to spend more time on the retrieval query for improving recall.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Some relcomp reports are available on stat1004.eqiad.wmnet at

  • /home/dcausse/baseline_recall-3t80-3t66.tgz
  • /home/dcausse/baseline_recall-80-60.tgz

Settings used are available on relforge in the en-wp-ltr-0617 and are:

$wgCirrusSearchFullTextQueryBuilderProfiles['recall_80_60'] = $wgCirrusSearchFullTextQueryBuilderProfiles['perfield_builder'];
$wgCirrusSearchFullTextQueryBuilderProfiles['recall_80_60']['settings']['filter'] = [
        'type' => 'default',
        'settings' => [
                'all' => [
                        'minimum_should_match' => '80%'
                ],
                'all.plain' => [
                        'minimum_should_match' => '60%'
                ],
        ]
];

$wgCirrusSearchFullTextQueryBuilderProfiles['recall_3t80_3t66'] = $wgCirrusSearchFullTextQueryBuilderProfiles['perfield_builder'];
$wgCirrusSearchFullTextQueryBuilderProfiles['recall_3t80_3t66']['settings']['filter'] = [
        'type' => 'default',
        'settings' => [
                'all' => [
                        'minimum_should_match' => '3<80%'
                ],
                'all.plain' => [
                        'minimum_should_match' => '3<66%'
                ],
        ]
];

Other combinations can be tested.

Change 381964 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Allow controllig min_should_match in the filter clause

https://gerrit.wikimedia.org/r/381964

Change 381964 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Allow controllig min_should_match in the filter clause

https://gerrit.wikimedia.org/r/381964

debt triaged this task as Medium priority.Oct 5 2017, 4:43 PM