@Miriam has done logistic regression on the data gathered by https://media-search-signal-test.toolforge.org/ to transform the score for each search signal into a probability that an image is a good image for the search term
We need to translate this into an initial elastic search profile with a query builder that will compute a probability-of-an-image-being-good based on the results of the logistic regressions, and return **that** as the score
Implementation
---
The probability of an image being good based on the score for component X (`_score`) is
```
1 / ( 1 + exp( -1 * ( p * _score + q) )
```
| component | p | q |
| auxiliary_text | 0.018542 | -1.8933013 |
| caption | 0.024676 | -1.505207 |
| category | 0.019410 | -1.563868 |
| redirect.title | 0.030420 | -1.862426 |
| statement | 0.062275 | -0.418142 |
| suggest | 0.086211 | -1.771545 |
| title | 0.029252 | -1.809724 |
(note that `heading` and `text` are omitted, because their [[ https://en.wikipedia.org/wiki/F-score | f1 score ]] for the data we have is < 0.1)
Once the initial probabilities based on each component are calculated they need to be combined into a single probability using a weighted arithmetic mean, where the weights are the f1 scores for each component
| component | f1 score |
| auxiliary_text | 0.469 |
| caption | 0.460 |
| category | 0.408 |
| redirect.title | 0.452 |
| statement | 0.688 |
| suggest | 0.550 |
| title | 0.636 |
This will need to be implement using `function_score` or similar queries in elasticsearch
Testing
---
See (some ticket) for how to test each profile that we construct in this way