Issue discovered while working on T372641:
- Many Wiki Commons file captions which are labeled English are written in non-English languages
- --> This is causing English to be very overrepresented in the caption counts
Tasks:
- Take a sample of English-labeled captions
- Determine what % are non-English in order to generate some sort of threshold for uncertainty.