Page MenuHomePhabricator

[EPIC] Develop a confidence score for MediaSearch results
Open, Needs TriagePublic


As an image recommendations user, I do not want to be presented with bad suggestions, so that I do not lose confidence in the tools or add images to articles where they don't belong.

Initial manual testing of the results provided by the Image Recommendations API algorithms is showing that many MediaSearch results do not make good matches. Many of the bad matches are from text-based search, where one word in the article title matches a word in an image's title, filename or description.

In order to prevent the API from surfacing these bad matches, we need a confidence score so that we can filter based on that score and only deliver stronger matches with higher confidence scores via the API. While this will likely significantly reduce the coverage provided by MediaSearch, the goal is to significantly improve the accuracy, which is a necessary trade-off.

Some ideas for developing a confidence score include:

  • Use the probability-of-an-image-being-good score (T272710)
    • This incorporates heuristics already built into the MediaSearch algorithm like:
      • Matches based on depicts statements should be ranked highest
      • Matches based on text matches that match all of the words should be ranked higher than those that only match some of the words
      • Matches based on text matches that match the filename should be ranked higher than those that match words in the description or other wikitext
  • Others?

Subtasks of this ticket will represent experimenting with each option and running tests to determine if that option works well. This ticket can be considered complete when we have chosen a path forward for a confidence score and added it to the MediaSearch results output. (Implementing filtering based on that confidence score in the API will be a separate ticket).

Once we have a confidence score, we'll need to decide what the cutoff confidence score should be for image recommendations (and does it differ by use case?) Also, we'll want to measure how much cutting off by each confidence score decreases the coverage of image recommendations matches returned by MediaSearch.

Event Timeline