The goal of this task is to create a tool to manually test the image recommendations POC API, once T260832 is complete. There will be a follow up task to do the manual testing itself.
T273527 is the task to publish the API spec, so once that is complete, work on this can start - though it can't be finished until T260832 is complete.
- The tool will evaluate results on Arabic, Cebuano, English and Vietnamese wikis
- The tool will evaluate 500 unillustrated articles from each wiki
- The tool will run the API to get 500 random unillustrated articles from each wiki and their image recommendations
- The tool will display and evaluate the output (both a preview of the article text and the image), similar to https://media-search-signal-test.toolforge.org/
- The tool will allow testers to manually decide whether the match is strong, okay, or weak for each result for each unillustrated article.
- The tool will allow testers to manually decide whether the recommended image is explicit/NSFW.
- The tool will output the results into a spreadsheet, showing how many strong, okay, and weak matches were produced for each article, and what the source of each of those matches was
- Spreadsheet columns will be: wiki; article name; image; match strength; source (Wikidata, interlinks, Commons category, or MediaSearch); explicit
- The tool will log the API response time to evaluate performance.