Page MenuHomePhabricator

Improve prototypes of image classifiers trained on images from Commons Categories
Closed, ResolvedPublic

Description

Based on feedback and suggestions, improve the classifiers trained on T242229:

  • Improve data collection via Wikidata/Wikipedia
  • Retrain finetuned classifiers
  • [stretch] train classifiers from scratch T248692

Event Timeline

Weekly updates:
started working on data refinement, checked the categories for which we get lower accuracy, and refined the Commons category list associated to those

Weekly updates:
polished the Commons categories related to the 30 concepts for which we have lower accuracy. Downloaded the new data on stat1005. Ready for model re-train.

Weekly updates:

Weekly updates:

  • Refined the data, results are similar.
  • Computed the top-5 accuracy as final metric on the classifiers. This metric is widely used in image classification competitions such as Imagenet Large Scale Visual Recognition Challenge. It counts how many time the correct label is found among the top-5 predictions of the classifier.
  • Top-5 accuracy is around 80% for the first version, and 81.5% for the improved one, with major gains on classes we have worked on this quarter. https://docs.google.com/spreadsheets/d/18Er84wdWIme_KMOrOYZZQxq5z0d9O4L0nZMMibzQ_rc/edit?usp=sharing
  • I could close this task but i still hope to train a network from scratch by the end of the quarter :)

Resolving this for now. I was not able to do the stretch goal but will leave that task open hoping to work on it soon :)