@Miriam in research built a demo classifying commons images by their inclusion in featured categories on commons, essentially generating a quality score. It would be interesting to evaluate this score in the context of boosting image search results. It probably shouldn't have a huge weight, but can nudge images up/down based on the quality score.
Rough outline of evaluation:
[x] Collect a sample of a hundred or so media searches on commons. Hand filter to remove things that are hard to evaluate, not encyclopedic, etc. In the past this has been 10-20%. Picked ~100 queries from logs, data available in `hdfs:///user/dcausse/image_qual/commons_queries_handpicked.lst`
[x] Collect top n (1k? 8k?) results for each query into an index on relforge and Miriam's model on results. (Fetched 140K images using the search API). Data available in `hdfs:///user/dcausse/image_qual/img_preds`
[ ] Import results to relforge. Output of model is csv with 4 columns: page_id, title, score, error_message. When error_message is set score is NaN.
[ ] Try something with the scores and the scoring calculation :) Score is in [0, 1] so could try something like `base * (1 + 0.25 * (score - 0.5))` which gives +- 12.5% to the score?
[ ] Evaluation at this stage will mostly be human based. Use relforge software to look at how much the scores change ranking, evaluate some of the result sets it reports. Bonus points to somehow display the images in the relforge report, but could link somehow to the wmflabs instance and compare image lists there.
[ ] Super bonus points: Some simple html page with a dropdown for all the queries that hits the api and displays back an image grid for each ranker.