Train an image classifier to identify classes of images that are top candidates for deletion.
According to {T340546} (see last //Viable reasons frequency// section), 13.4% of all deletions are:
1. logos
2. books
3. screenshots
4. album covers
The initial direction is to:
[x] take a pre-trained [EfficientNet V2](https://arxiv.org/abs/2104.00298) model
[x] gather a dataset from Wikipedias (fair-use images) and/or Commons (free ones) with roughly 10k samples per class
[x] fine-tune the model on our 4 classes with a train/validation dataset split
[x] evaluate against a separate dataset of **available** images (class to be extracted from Commons categories)
[x] evaluate against a separate dataset of **deleted** images (class to be extracted from their reason for deletion)
---
==Results
====Evaluation metrics
- [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision#In_multiclass_classification)
- [area under the curve](https://keras.io/api/metrics/classification_metrics/#auc-class) (AUC), computed separately for each class and then averaged across classes, see also [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#ROC_curves_beyond_binary_classification)
- AUC [precision/recall](https://en.wikipedia.org/wiki/Precision_and_recall)
- AUC [ROC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)
- model's loss function, i.e., categorical [cross-entropy](https://en.wikipedia.org/wiki/Cross-entropy#Cross-entropy_loss_function_and_logistic_regression)
====Legend
- all scores are percentages
- best performances in bold
- numbers in round brackets are the training epochs that obtained the best scores. Stars denote the same epoch for all metrics.
====Dataset: available images
Source: Commons images with class categories
| **class** | **accuracy** | **AUC precision/recall** | **AUC ROC** | **loss** | **# samples**
| album | 91.6 (4) | 97.7 (19) | 97.8 (19) | 21.7 (4) | 29,951
| book | 80.5 (25) | 88.2 (12) | 88.5 (22) | 46.8 (3) | 10,995
| **logo** | **96.9** (8*) | **98.8** | **99** | **10.2** | 47,976
| screenshot | 90.5 (17*) | 96 | 96.4 | 24.3 | 53,172
====Dataset: deleted images
Source: {T350020}
| **class** | **accuracy** | **AUC precision/recall** | **AUC ROC** | **loss** | **# samples**
| album | 73.2 (16*) | 79.2 | 80.2 | 65.1 | 1,292
| book | 64.5 (7) | 68.2 (4) | 69 (4) | 79.7 (4) | 4,882
| **logo** | **87.5** (5) | **90** (11) | **91.6** (11) | **47.9** (13) | 21,020
| screenshot | 62.7 (6) | 67.2 (7) | 68.9 (7) | 77.2 (1) | 4,740
==Observations
- the logo classifier is clearly the best one
- all performances decrease with the deleted images dataset. Based on manual checks, the dataset looks more noisy compared to the available images one. This is possibly caused by:
- the image extraction method, i.e., reason for deletion text VS Commons categories
- the randomness of non-class samples, which seems to have penalized the classifiers
==Code
- [training](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/multiclass_efficientnet.py)
- [evaluation](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/evaluate_multiclass_efficientnet.py)
- [demo visualization](https://gitlab.wikimedia.org/mfossati/scriptz/-/blob/fbbef87dbd7d1ac4958754a53cc3643376fcdf12/show_classified_images.ipy)