|Open||Miriam||T155538 General image classifier for commons|
|Open||Miriam||T215413 Image Classification Working Group|
|Open||Miriam||T228441 Design a pipeline for image classification|
|Resolved||Miriam||T242229 Test the feasibility of a classifier trained on Commons categories|
|Resolved||Miriam||T242969 A list of meaningful Commons Categories whose images can be used to train image classifiers|
- Mentioned In
- T242971: A report on accuracy and performance of the classification models
T242229: Test the feasibility of a classifier trained on Commons categories
T242970: A set of prototypes of image classifiers trained on images from Commons Categories
- Mentioned Here
- T242229: Test the feasibility of a classifier trained on Commons categories
Previous week updates (from T242229):
- downloaded the list of coco-stuff classes which include highly generic categories of people, animals, and things which exist in the visual world: https://github.com/nightrome/cocostuff
- downloaded the list of categories in Commons, with the counts of the number of images per categories.
- to create the initial seed of categories we want to consider for object categorization in Commons, I computed fasttext vectors on both COCO categories and Commons Categories, and I am checking what are the commons categories that we can use to represent COCO categories.
- matched COCO categories with Commons categories by computing cosine distance
- dumped, for each COCO category, all Commons categories with distance <0.1
- started cleaning the results: for some categories, the matching works well, for others, it's not necessarily correct - should be done by this week: https://docs.google.com/spreadsheets/d/1vSK2TzRG6RiyxID8qGtAJR0uX_ZkbmNtcaCsewwnaEQ/edit#gid=2034166525
@leila the challenge was to map a set of general categories to the very specific commons categories.
I used a semi-automated approach, where I took the list of the 5M+ categories from commons, and I tried to match them with the 200 COCO categories using word vectors. However word vectors are not necessarily the best solution for this problem, and while this approach helped reducing the space of search, i had to do a lot of cleaning up of the resulting COCO-commons matches by either removing some irrelevant Commons categories, or manually searching for more Commons categories. Now this is done although open for improvement.
Below you can find the list of 160 COCO categories for which we have matches in the set of Commons categories, and the corresponding total number of images expected.
This is the raw list of Commons categories associated to each COCO category:
Next up: image download
Weekly Update: the first round of this is done. There were a number of challenges which I will report in the final report on the feasibility of our own image classifiers. I also made a script to download images, and ran in for a few days on stat1005. 700k Images are downloaded now!