Note: this ticket has been rewritten to reflect new analysis done in March 2021
We need to translate this into a new elastic search profile (create a new profile rather than changing the existing one, for now) with a query builder that will compute a probability-of-an-image-being-good based on the results of the logistic regressions, and return that as the score
Implementation
The probability of an image being good based on the elasticsearch scores for elasticsearch search field is
1 / ( 1 + exp( -1 * ( ( coefficient_for_field_A * score_for_field_A ) + ( coefficient_for_field_B * score_for_field_B ) + ... + intercept ) ) )
field | coefficient |
descriptions | 0.019320230186222098 |
title | 0.0702949038300864 |
category | 0.05158078808882278 |
redirect.title | 0.01060150471482338 |
statements | 0.11098311564161133 |
Intercept is -1.1975600089068401
It's also probably a good idea to set title and auxiliary_text to a small non-zero number, just to preserve ordering if those are the only fields that match
This will need to be implemented using function_score or similar queries in elasticsearch
Testing
See T271801 for how to test each profile that we construct in this way and decide if it's better or worse