For the current iteration of image suggestions we have a tuned search profile where the elasticsearch score returned reflects the likelihood that an image is a good match, and we had anticipated using this as a confidence score
While generating the image suggestions data we gather data from wikidata and save it in hdfs so that it can be picked up by the search pipeline and imported into the commonswiki search index. This data is essential for calculating the confidence score ... however, we can't actually get a confidence score until the data is in the index, and therefore we're unable to finish generating the suggestions data until we're sure the data has been imported
In order to work around that, this ticket is to calculate the confidence score before the data is available in elasticsearch. Only 1 of the 4 signals used to calculate the score is bm25-based, so it should be possible