@Miriam in research built a demo classifying commons images by their inclusion in featured categories on commons, essentially generating a quality score. It would be interesting to evaluate this score in the context of boosting image search results. It probably shouldn't have a huge weight, but can nudge images up/down based on the quality score.
Rough outline of evaluation:
- [x] Collect a sample of a hundred or so media searches on commons. Hand filter to remove things that are hard to evaluate, not encyclopedic, etc. In the past this has been 10-20%.
-- Picked ~100 queries from logs, data available in `hdfs:///user/dcausse/image_qual/commons_queries_handpicked.lst`
- [x] Collect top n (1k? 8k?) results for each query into an index on relforge and Miriam's model on results. (Fetched 200K images using the search API).
-- Data available in `stat1005:~dcausse/commons_img_quality/preds_filtered.csv`
- [x] Import results to relforge
-- 1.8M docs imported to https://relforge1001.eqiad.wmnet:9243/commons_image_quality/
- [x] Try something with the scores and the scoring calculation :) Score is in [0, 1] so could try something like `base * (1 + 0.25 * (score - 0.5))` which gives +- 12.5% to the score?
-- used a simple weighted sum for now
- [x] Evaluation at this stage will mostly be human based. Use relforge software to look at how much the scores change ranking, evaluate some of the result sets it reports. Bonus points to somehow display the images in the relforge report, but could link somehow to the wmflabs instance and compare image lists there.
-- I was not able able to use exactly the same profile as production on a subset of the data, term stats are too different, I could use a simple profile with the all field which gives similar results on relforge and production. I'm currently importing all commons files to relforge so that we can actually compare against the production profiles.
- [ ] Super bonus points: Some simple html page with a dropdown for all the queries that hits the api and displays back an image grid for each ranker.
-- A small frontend app might required, the jsondiff.py is not well suited for this, the default size of the thumbnail images is also too small.