Improve prototypes of image classifiers trained on images from Commons Categories
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Miriam
	Apr 14 2020, 11:14 AM

Description

Based on feedback and suggestions, improve the classifiers trained on T242229:

Improve data collection via Wikidata/Wikipedia
Retrain finetuned classifiers
[stretch] train classifiers from scratch T248692

Related Objects
Search...

Status	Assigned	Task
Open	Miriam	T155538 General image classifier for commons
Open	Miriam	T215413 Image Classification Research and Development
Invalid	Miriam	T228441 Design a pipeline for image classification
Resolved	Miriam	T250150 Improve prototypes of image classifiers trained on images from Commons Categories

Event Timeline

Miriam created this task.Apr 14 2020, 11:14 AM

Miriam mentioned this in T228441: Design a pipeline for image classification.

Miriam edited projects, added Research (FY2019-20-Research-April-June); removed Research.

Miriam removed subscribers: • Mholloway, Cirdan.

Weekly updates:
started working on data refinement, checked the categories for which we get lower accuracy, and refined the Commons category list associated to those

Weekly updates:
polished the Commons categories related to the 30 concepts for which we have lower accuracy. Downloaded the new data on stat1005. Ready for model re-train.

Miriam updated the task description. (Show Details)May 22 2020, 4:13 PM

Weekly updates:

retrained the model with the new, polished data
improvements are +7% overall, and +15% for the classes where we have modified the data! https://docs.google.com/spreadsheets/d/18Er84wdWIme_KMOrOYZZQxq5z0d9O4L0nZMMibzQ_rc/edit?usp=sharing
noticed that there is another minor data improvement: basically, there are some concepts whose data comes from ambiguous Commons categories. My plan is to remove those and re-train the model on the cleaner data. Will try to do this next week.

Weekly updates:

Refined the data, results are similar.
Computed the top-5 accuracy as final metric on the classifiers. This metric is widely used in image classification competitions such as Imagenet Large Scale Visual Recognition Challenge. It counts how many time the correct label is found among the top-5 predictions of the classifier.
Top-5 accuracy is around 80% for the first version, and 81.5% for the improved one, with major gains on classes we have worked on this quarter. https://docs.google.com/spreadsheets/d/18Er84wdWIme_KMOrOYZZQxq5z0d9O4L0nZMMibzQ_rc/edit?usp=sharing
I could close this task but i still hope to train a network from scratch by the end of the quarter :)

Resolving this for now. I was not able to do the stretch goal but will leave that task open hoping to work on it soon :)

Improve prototypes of image classifiers trained on images from Commons CategoriesClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Improve prototypes of image classifiers trained on images from Commons Categories
Closed, ResolvedPublic
Actions

Related Objects
Search...