Investigate how the scoring works, and how it might be tuned so as to get the search results we might reasonably expect to get.
Multi-lingual captions are stored in the opening_text field in elasticsearch
Investigate how the scoring works, and how it might be tuned so as to get the search results we might reasonably expect to get.
Multi-lingual captions are stored in the opening_text field in elasticsearch
Status | Subtype | Assigned | Task | |
---|---|---|---|---|
· · · | ||||
Resolved | Cparle | T187438 Implement searching of multilingual captions on commons | ||
Resolved | Cparle | T192535 Investigate ranking of search results for a multi-lingual caption search | ||
· · · |
Ok here's how to tune the search parameters
Added the following to settings.d/10-cirrus.php
$wgCirrusSearchFullTextQueryBuilderProfile = 'commons_profile'; $wgCirrusSearchFullTextQueryBuilderProfiles['commons_profile'] = [ 'builder_class' => \CirrusSearch\Query\FullTextSimpleMatchQueryBuilder::class, 'settings' => [ 'default_min_should_match' => '1', 'default_query_type' => 'most_fields', 'default_stem_weight' => 3.0, 'fields' => [ 'title' => 0.3, 'redirect.title' => [ 'boost' => 0.27, 'in_dismax' => 'redirects_or_shingles' ], 'suggest' => [ 'is_plain' => true, 'boost' => 0.20, 'in_dismax' => 'redirects_or_shingles', ], 'category' => 0.05, 'heading' => 0.05, 'text' => [ 'boost' => 0.6, 'in_dismax' => 'text_and_opening_text', ], 'opening_text' => [ 'boost' => 0.888, 'in_dismax' => 'text_and_opening_text', ], 'auxiliary_text' => 0.05, 'file_text' => 0.5, ], 'phrase_rescore_fields' => [ // very low (don't forget it's multiplied by 10 by default) // Use the all field to avoid loading positions on another field, // score is roughly the same when used on text 'all' => 0.06, 'all.plain' => 0.1, ], ], ];
Something similar would need to be added to wmf-config here https://github.com/wikimedia/operations-mediawiki-config/tree/master/wmf-config to do similar tuning on labs or production