This is the master task gathering our efforts towards developing in-house image classification models to be used across the organization.
It includes tasks on estimating data size and access, resource availability, model development and product applications.
[x] **Image data:** @Miriam + @Gilles to work on estimating the size of Commons image corpus at different resolutions. T215250
[X] **GPUs:** @elukey + @EBernhardson to work on connecting the GPU to stat1005 when time allows; Miriam will test GPU models afterwards. See the GPU task here. It was suggested that GPUs are useful for others in Research (e.g. @diego and @Isaac) and Search working on text analysis. T148843
[x] **Evaluating existing classifiers:** This is a short-term effort towards developing our own classification models. The Research team will work on a protocol for evaluating generalisability and biases of existing image classifiers that SDC (@Ramsey-WMF @Abit @dr0ptp4kt @Cparle) or others (@musikanimal) might want to use, based on diverse image sets from Wikidata/Commons. The Research team will also help with the integration between Wikidata items and the labels from existing image classifiers.
[] ** Longer-term: Training our own image classifiers**: The longer term plan, when data and processing units will be available, is to train our own image classifiers for various purposes: object detection, adult image filtering, image quality, image authenticity etc.
Open Questions:
* How to create a common repository for image classifiers and machine learning models? @Harej will explore different possibilities.
* How to integrate and validate such models though mobile environemnts?