Train an image classifier to identify classes of images that are top candidates for deletion.
According to T340546: [XL] Analysis of deletion requests on Commons (see last Viable reasons frequency section), 13.4% of all deletions are:
- logos
- books
- screenshots
- album covers
The initial direction is to:
- take a pre-trained EfficientNet V2 model
- gather a dataset from Wikipedias (fair-use images) and/or Commons (free ones) with roughly 10k samples per class
- fine-tune the model on our 4 classes with a train/validation dataset split
- evaluate against a separate dataset of available images (class to be extracted from Commons categories)
- evaluate against a separate dataset of deleted images (class to be extracted from their reason for deletion)
Results
Evaluation metrics
- accuracy
- area under the curve (AUC), computed separately for each class and then averaged across classes, see also here
- AUC precision/recall
- AUC ROC
- model's loss function, i.e., categorical cross-entropy
Legend
- all scores are percentages
- best performances in bold
- numbers in round brackets are the training epochs that obtained the best scores. Stars denote the same epoch for all metrics.
Dataset: available images
Source: Commons images with class categories
class | accuracy | AUC precision/recall | AUC ROC | loss | # samples |
album | 91.6 (4) | 97.7 (19) | 97.8 (19) | 21.7 (4) | 29,951 |
book | 80.5 (25) | 88.2 (12) | 88.5 (22) | 46.8 (3) | 10,995 |
logo | 96.9 (8*) | 98.8 | 99 | 10.2 | 47,976 |
screenshot | 90.5 (17*) | 96 | 96.4 | 24.3 | 53,172 |
Dataset: deleted images
Source: T350020: Access request to deleted image files in the production Swift cluster
class | accuracy | AUC precision/recall | AUC ROC | loss | # samples |
album | 73.2 (16*) | 79.2 | 80.2 | 65.1 | 1,292 |
book | 64.5 (7) | 68.2 (4) | 69 (4) | 79.7 (4) | 4,882 |
logo | 87.5 (5) | 90 (11) | 91.6 (11) | 47.9 (13) | 21,020 |
screenshot | 62.7 (6) | 67.2 (7) | 68.9 (7) | 77.2 (1) | 4,740 |
Observations
- the logo classifier is clearly the best one
- all performances decrease with the deleted images dataset. Based on manual checks, the dataset looks more noisy compared to the available images one. This is possibly caused by:
- the image extraction method, i.e., reason for deletion text VS Commons categories
- the randomness of non-class samples, which seems to have penalized the classifiers