Change Details

Train an image classifier to identify classes of images that are top candidates for deletion. According to {T340546} (see last //Viable reasons frequency// section), 13.4% of all deletions are: 1. logos 2. books 3. screenshots 4. album covers The initial direction is to: [x] take a pre-trained [EfficientNet V2](https://arxiv.org/abs/2104.00298) model [x] gather a dataset from Wikipedias (fair-use images) and/or Commons (free ones) with roughly 10k samples per class [x] fine-tune the model on our 4 classes with a train/validation dataset split [x] evaluate against a separate dataset of **available** images (class to be extracted from Commons categories) [x] evaluate against a separate dataset of **deleted** images (class to be extracted from their reason for deletion) --- ==Results ====Evaluation metrics - [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision#In_multiclass_classification) - [area under the curve](https://keras.io/api/metrics/classification_metrics/#auc-class) (AUC), computed separately for each class and then averaged across classes, see also [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#ROC_curves_beyond_binary_classification) - AUC [precision/recall](https://en.wikipedia.org/wiki/Precision_and_recall) - AUC [ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) - model's loss function, i.e., categorical [cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy#Cross-entropy_loss_function_and_logistic_regression) ====Legend - all scores are percentages - best performances in bold - numbers in round brackets are the training epochs that obtained the best scores. Stars denote the same epoch for all metrics. ====Dataset: available images Source: Commons images with class categories | **class** | **accuracy** | **AUC precision/recall** | **AUC ROC** | **loss** | **# samples** | album | 91.6 (4) | 97.7 (19) | 97.8 (19) | 21.7 (4) | 29,951 | book | 80.5 (25) | 88.2 (12) | 88.5 (22) | 46.8 (3) | 10,995 | **logo** | **96.9** (8*) | **98.8** | **99** | **10.2** | 47,976 | screenshot | 90.5 (17*) | 96 | 96.4 | 24.3 | 53,172 ====Dataset: deleted images Source: {T350020} | **class** | **accuracy** | **AUC precision/recall** | **AUC ROC** | **loss** | **# samples** | album | 73.2 (16*) | 79.2 | 80.2 | 65.1 | 1,292 | book | 64.5 (7) | 68.2 (4) | 69 (4) | 79.7 (4) | 4,882 | **logo** | **87.5** (5) | **90** (11) | **91.6** (11) | **47.9** (13) | 21,020 | screenshot | 62.7 (6) | 67.2 (7) | 68.9 (7) | 77.2 (1) | 4,740 ==Observations - the logo classifier is clearly the best one - all performances decrease with the deleted images dataset. Based on manual checks, the dataset looks more noisy compared to the available images one. This is possibly caused by: - the image extraction method, i.e., reason for deletion text VS Commons categories - the randomness of non-class samples, which seems to have penalized the classifiers ==Code - [training](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/multiclass_efficientnet.py) - [evaluation](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/evaluate_multiclass_efficientnet.py) - [demo visualization](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/show_classified_images.ipy)

Train an image classifier to identify classes of images that are top candidates for deletion. According to {T340546} (see last //Viable reasons frequency// section), 13.4% of all deletions are: 1. logos 2. books 3. screenshots 4. album covers The initial direction is to: [x] take a pre-trained [EfficientNet V2](https://arxiv.org/abs/2104.00298) model [x] gather a dataset from Wikipedias (fair-use images) and/or Commons (free ones) with roughly 10k samples per class [x] fine-tune the model on our 4 classes with a train/validation dataset split [] evaluate against a separate dataset of deleted images (class to be extracted from their reason for deletion). See {T350020} [x] evaluate against a separate dataset of **available** images (class to be extracted from Commons categories) [x] evaluate against a separate dataset of **deleted** images (class to be extracted from their reason for deletion) --- ==Results ====Evaluation metrics - [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision#In_multiclass_classification) - [area under the curve](https://keras.io/api/metrics/classification_metrics/#auc-class) (AUC), computed separately for each class and then averaged across classes, see also [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#ROC_curves_beyond_binary_classification) - AUC [precision/recall](https://en.wikipedia.org/wiki/Precision_and_recall) - AUC [ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic) - model's loss function, i.e., categorical [cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy#Cross-entropy_loss_function_and_logistic_regression) ====Legend - all scores are percentages - best performances in bold - numbers in round brackets are the training epochs that obtained the best scores. Stars denote the same epoch for all metrics. ====Dataset: available images Source: Commons images with class categories | **class** | **accuracy** | **AUC precision/recall** | **AUC ROC** | **loss** | **# samples** | album | 91.6 (4) | 97.7 (19) | 97.8 (19) | 21.7 (4) | 29,951 | book | 80.5 (25) | 88.2 (12) | 88.5 (22) | 46.8 (3) | 10,995 | **logo** | **96.9** (8*) | **98.8** | **99** | **10.2** | 47,976 | screenshot | 90.5 (17*) | 96 | 96.4 | 24.3 | 53,172 ====Dataset: deleted images Source: {T350020} | **class** | **accuracy** | **AUC precision/recall** | **AUC ROC** | **loss** | **# samples** | album | 73.2 (16*) | 79.2 | 80.2 | 65.1 | 1,292 | book | 64.5 (7) | 68.2 (4) | 69 (4) | 79.7 (4) | 4,882 | **logo** | **87.5** (5) | **90** (11) | **91.6** (11) | **47.9** (13) | 21,020 | screenshot | 62.7 (6) | 67.2 (7) | 68.9 (7) | 77.2 (1) | 4,740 ==Observations - the logo classifier is clearly the best one - all performances decrease with the deleted images dataset. Based on manual checks, the dataset looks more noisy compared to the available images one. This is possibly caused by: - the image extraction method, i.e., reason for deletion text VS Commons categories - the randomness of non-class samples, which seems to have penalized the classifiers ==Code - [training](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/multiclass_efficientnet.py) - [evaluation](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/evaluate_multiclass_efficientnet.py) - [demo visualization](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/show_classified_images.ipy)