This is the master task gathering our efforts towards developing in-house image classification models to be used across the organization.
It includes tasks on estimating data size and access, resource availability, model development and product applications.
- Image data: @Miriam + @Gilles to work on estimating the size of Commons image corpus at different resolutions. T215250
- GPUs: @elukey + @EBernhardson to work on connecting the GPU to stat1005 when time allows; Miriam will test GPU models afterwards. See the GPU task here. It was suggested that GPUs are useful for others in Research (e.g. @diego and @Isaac) and Search working on text analysis. T148843
- Evaluating existing classifiers: This is a short-term effort towards developing our own classification models. The Research team will work on a protocol for evaluating generalisability and biases of existing image classifiers that SDC (@Ramsey-WMF @Abit @dr0ptp4kt @Cparle) or others (@MusikAnimal) might want to use, based on diverse image sets from Wikidata/Commons. The Research team will also help with the integration between Wikidata items and the labels from existing image classifiers.
- Longer-term: Training our own image classifiers: The longer term plan, when data and processing units will be available, is to train our own image classifiers for various purposes: object detection, adult image filtering, image quality, image authenticity etc. T331134