Once T271799 is implemented, it'll be worth looking into whether the modified score returned from elasticsearch is useful as a confidence score for image matchingthe score returned from elasticsearch that's used to rank search results should, we think, be a number between 1 and 0 that indicates the probability of an image being good
Not sure exactly how we'll do this,We'd like to use this as a confidence score for image matching, so we need to calibrate whether the estimated probability that we have is realistic
The simplest way to do this is to
1. gather N new ratings from https://media-search-signal-test.toolforge.org/ and then run searches with the new search profile
2. run searches with the new profile for all search terms for the newly labeled images
3. but we could look at the percentage of therecord the elasticsearch scores for all the newly labelled images returned from ain the search that are good,results
4. and see if that matchessort the labeled images from the search results into buckets according to their elasticcsearch scores
5. Seeing as this is how the elastic score was computedcount the good/bad images in the first place it might be a bit of a circular way to measure things,each bucket
6. so we might need manual testing also/insteadwe'd expect (number good)/(number bad+number good) in each bucket to approximately equal the mid-point of the bucket